next up previous
Next: The CICLing-2002 corpus Up: Clustering Narrow-Domain Short Texts Previous: The Kullback-Leibler Distance


Description of the corpora

In the experiments we have carried out, three corpora with different characteristics with respect to their size and their balance were used. We consider that all these very narrow domain corpora are suitable for our experiments because of their average size per abstract and their narrow domain. In the following subsections we describe each corpus into detail.



Subsections

David Pinto 2007-05-08