Conclusions

We have observed that it is possible to introduce a measure to determine the relative hardness of clustering corpora based on the vocabulary overlapping. The obtained results show that there exists a correlation between the -measure and the RH formula.
With respect to the analysis carried out in [1], the introduced formula in our research work relies only on the vocabulary overlapping and it does not use any classifier. In fact, we use the MajorClust clustering algorithm only to evaluate the quality of the proposed formula by employing the -measure. Therefore, the introduced RH formula may be used in an unsupervised manner in order to determine the relative hardness of clustering corpora.

David Pinto
2007-10-05