When uploading a file to import into a dataset, you are asked to choose an input type. Here is why.
Selecting the type of data that is being imported helps improve the way Lettria analyzes your text.
Here is how.
In this article:
Conversation
For conversations, we need to determine which speaker is speaking each sentence. With that information, we can further process coreferences and correctly dispatch clusters.
It is possible that the speakers' information is given in the speech-to-text result. In that case, there is no need for further analysis.
If speakers' information is not given, then we attempt to detect dialogue with text analysis.
Example
Hello.
What's happening here?
Who are you?
I am John! → speaker 1
Oh, and I am Kevin! → speaker 2
Great. Bye.
Report
For reports, we attempt to build a document structure between chunks by looking at indentation as well as text analysis.
Structure detection using parts and sub-parts
Examples
1 2 3
1 1.1 1.2
A) B) C)
1.a 1.b 2.a 2.b
I II III IV
Introduction, Section 1, Section 2, Conclusion
Chapter 2, Paragraph 4
Structure detection using ordered and unordered lists
Example
This is what we need:
- Milk
- Sugar
- Flour
Review
A review is a short text in which the speaker or writer is expressing their opinion on something.
Tweet
For a Tweet, we expect a short text, so we do not search for hierarchy of chunks.
However, we are extra careful with language that isn't natural, such as abbreviations, slang words, emojis, and so on.
Narrative
For narratives, we search for both structure as well as dialogue in chunks.
Example
...
He suddenly opened the door.
"Hello."
"What's happening here?"
"Who are you?"
"I am John!" → speaker 1
"Oh, and I am Kevin!" → speaker 2
"Great. Bye."
They both went home after that, not really knowing why.
-------
Chapter 7 - The Magic Forest
The magic forest was mysterious.
...
Other
This is the default input type. Only indentation hierarchy is checked.
A single file may contain different types of data: text, tables, images, others. Moreover, text itself may be organised into several parts and sub-parts.
To deal with this, we divide the file's content into different chunks.
This way, we can separate different types of data, and apply hierarchy between chunks.
If you need more help, please reach out to us at hello@lettria.com