Next: Correlation between relative hardness
Up: On the Relative Hardness
Previous: Calculating the relative hardness
In order to evaluate the relative hardness formula used in the experiments, we have carried out an unsupervised clustering of all the documents of each subcorpus obtained for each dataset. We have chosen the MajorClust clustering algorithm [7] due to its peculiarity of taking into account both, the inside and outside similarities among the clusters obtained during its execution. In order to keep independent the validation with respect to RH, we have used the tf-idf formula for calculating the input similarity matrix for MajorClust. Each evaluation was performed with the F-Measure formula which is calculated as follows:
given a set of clusters
and a set of classes
, the -measure between a cluster
and a class is given by the following formula.
|
(3) |
where
,
. and are defined as follows:
|
(4) |
and
|
(5) |
The global performance of a clustering method is calculated by using the values of , the cardinality of the set of clusters obtained, and normalising by the total number of documents in the collection (). The obtained measure is named -measure and it is shown in Equation (6).
|
(6) |
Next: Correlation between relative hardness
Up: On the Relative Hardness
Previous: Calculating the relative hardness
David Pinto
2007-10-05