We employed the measure, which is commonly used in information retrieval [24], in order to determine which method obtains the best
performance. Given a set of clusters
and a set of classes
, the measure between a cluster
and a class is given by the following formula.

(7) 
where
,
. and are defined as follows:

(8) 
and

(9) 
The global performance of a clustering method is calculated by using the values of , the cardinality of the set of clusters obtained, and normalizing by the total number of documents in the collection (). The obtained measure is named measure and it is shown in equation 10.

(10) 
David Pinto
20070508