Performance measurement

Next: Results Up: Experimental results Previous: Experimental results

Performance measurement

We employed the

-measure, which is commonly used in information retrieval [24], in order to determine which method obtains the best performance. Given a set of clusters $\{G_1,\ldots,G_m\}$ and a set of classes $\{C_1,\ldots,C_n\}$ , the

-measure between a cluster

and a class

is given by the following formula.

$\displaystyle F_{ij}=\frac{2\cdot P_{ij}\cdot R_{ij}}{P_{ij}+R_{ij}},$

(7)

where $1\le i\le m$ , $1\le j\le n$ . $P_{ij}$ and $R_{ij}$ are defined as follows:

$\displaystyle P_{ij}=\frac{\mbox{Number of texts from cluster }i\mbox{ in class }j} {\mbox{Number of texts from cluster }i},$

(8)

and

$\displaystyle R_{ij}=\frac{\mbox{Number of texts from cluster }i\mbox{ in class }j} {\mbox{Number of texts in class }j}.$

(9)

The global performance of a clustering method is calculated by using the values of $F_{ij}$ , the cardinality of the set of clusters obtained, and normalizing by the total number of documents in the collection ( $\vert D\vert$ ). The obtained measure is named -measure and it is shown in equation 10.

$\displaystyle F=\sum_{1\le i\le m}\frac{\vert G_i\vert}{\vert D\vert}\max_{1\le j\le n}F_{ij}.$

(10)

Next: Results Up: Experimental results Previous: Experimental results

David Pinto 2007-05-08