Clustering of Abstracts in a Narrow Domain

As was mentioned in Section 1, previous works for clustering abstracts in a narrow domain (see [9], [1], and [6]) used a very small collection (only 48 abstracts and 6 categories). Therefore, there exists a need of a bigger sized real corpus in order to verify the results obtained. Following, we introduce hep-ex collection, a real corpus obtained from the CERN.


David Pinto 2006-05-25