Next: Bibliography
Up: A Comparative Study of
Previous: Results
We have carried out a comparative study of the behaviour of five clustering methods applied
to two corpora with very different characteristics. Each corpus belongs to a very narrow domain doing our task
even more difficult. The use of the transition point technique have been successful and we have observed that
this technique obtains best results in comparison with the DF and TS techniques. Moreover, those results are
stable upon the use of different clustering algorithms. This suggests that there exists an independence
between the feature selection techniques and the clustering methods. Despite we have used a very strong
measure for the clustering process (F-Measure), it would be desirable to repeat the experiments over other
corpora of different domains to confirm our hypothesis. Unfortunately, at the moment there exist a lackness
of gold standard for clustering abstracts on narrow domains, doing this task even more difficult. We consider
that more attention from the linguistic community is required for the clustering of narrow domain task, not
only for experimenting on different feature selection techniques, but also for constructing new narrow domain
corpora, with gold standards provided by experts in such domains.
Next: Bibliography
Up: A Comparative Study of
Previous: Results
David Pinto
2006-05-25