In order to evaluate the relative hardness formula used in the experiments, we have carried out an unsupervised clustering of all the documents of each subcorpus obtained for each dataset. We have chosen the MajorClust clustering algorithm [7] due to its peculiarity of taking into account both, the inside and outside similarities among the clusters obtained during its execution. In order to keep independent the validation with respect to RH, we have used the tfidf formula for calculating the input similarity matrix for MajorClust. Each evaluation was performed with the FMeasure formula which is calculated as follows:
given a set of clusters
and a set of classes
, the measure between a cluster
and a class is given by the following formula.

(3) 
where
,
. and are defined as follows:

(4) 
and

(5) 
The global performance of a clustering method is calculated by using the values of , the cardinality of the set of clusters obtained, and normalising by the total number of documents in the collection (). The obtained measure is named measure and it is shown in Equation (6).

(6) 
David Pinto
20071005