Next: Data Set
Up: Clustering Abstracts of Scientific
Previous: Term Selection Methods
As was mentioned in Section 1, previous works for clustering abstracts in a narrow domain
(see , , and ) used a very small collection (only 48 abstracts and 6 categories).
Therefore, there exists a need of a bigger sized real corpus in order to verify the results obtained. Following, we introduce hep-ex
collection, a real corpus obtained from the CERN.