Next: Calculating the relative hardness
Up: On the Relative Hardness
The preliminary experiments were carried out by using three different corpora: the R8 version of the Reuters collection (train and test) and, partially, a reduced version of the 20 Newsgroups named ``Mini20Newsgroups''. We have pre-processed each corpus eliminating punctuation symbols, stopwords and, thereafter, applying the Porter stemmer. The characteristics of each corpus after the pre-processing are given in Table 1.
Characteristics of Reuters-R8 and Mini20Newsgroups