The topics of this corpus are the following ones: Linguistic (semantics, syntax, morphology, and parsing), Ambiguity (WSD, anaphora, POS, and spelling), Lexicon (lexics, corpus, and text generation), and Text Processing (information retrieval, summarization, and classification of texts). The distribution and the features of this corpus are shown in Tables 1 and 2, respectively.
Category | # of abstracts |
Linguistics | 11 |
Ambiguity | 15 |
Lexicon | 11 |
Text Processing | 11 |
Feature | Value |
Size of the corpus (bytes) | 23,971 |
Number of categories | 4 |
Number of abstracts | 48 |
Total number of terms | 3,382 |
Vocabulary size (terms) | 953 |
Term average per abstract | 70.45 |