Input types explained

When uploading a file to import into a dataset, you are asked to choose an input type. Here is why.

Selecting the type of data that is being imported helps improve the way Lettria analyzes your text.

Here is how.

In this article:


Conversation

For conversations, we need to determine which speaker is speaking each sentence. With that information, we can further process coreferences and correctly dispatch clusters.

It is possible that the speakers' information is given in the speech-to-text result. In that case, there is no need for further analysis.

If speakers' information is not given, then we attempt to detect dialogue with text analysis.

Example

Hello.
What's happening here?
Who are you?
I am John! → speaker 1
Oh, and I am Kevin! → speaker 2
Great. Bye.

Report

For reports, we attempt to build a document structure between chunks by looking at indentation as well as text analysis.

Structure detection using parts and sub-parts

Examples

1 2 3
1 1.1 1.2
A) B) C)
1.a 1.b 2.a 2.b
I II III IV
Introduction, Section 1, Section 2, Conclusion
Chapter 2, Paragraph 4

Structure detection using ordered and unordered lists

Example

This is what we need:
- Milk
- Sugar
- Flour

 

Review

A review is a short text in which the speaker or writer is expressing their opinion on something.

Tweet

For a Tweet, we expect a short text, so we do not search for hierarchy of chunks.

However, we are extra careful with language that isn't natural, such as abbreviations, slang words, emojis, and so on.

Narrative

For narratives, we search for both structure as well as dialogue in chunks.

Example

...

He suddenly opened the door.

"Hello."
"What's happening here?"
"Who are you?"
"I am John!" → speaker 1
"Oh, and I am Kevin!" → speaker 2
"Great. Bye."

They both went home after that, not really knowing why.

-------

Chapter 7 - The Magic Forest

The magic forest was mysterious.

...

Other

This is the default input type. Only indentation hierarchy is checked.

A single file may contain different types of data: text, tables, images, others. Moreover, text itself may be organised into several parts and sub-parts.

To deal with this, we divide the file's content into different chunks.

This way, we can separate different types of data, and apply hierarchy between chunks.

If you need more help, please reach out to us at hello@lettria.com