next up previous
Next: Experimental Results Up: Clustering of Abstracts in Previous: Data Set

Performance Measurement

We used $ F$-measure (commonly used in information retrieval [16]) in order to determine which method obtains the best performance. Given a set of clusters $ \{G_1,\ldots,G_m\}$ and a set of classes $ \{C_1,\ldots,C_n\}$, the $ F$-measure between a cluster $ i$ and a class $ j$ is given by the following formula.

$\displaystyle F_{ij}=\frac{2\cdot P_{ij}\cdot R_{ij}}{P_{ij}+R_{ij}},$ (3)

where $ 1\le i\le m$, $ 1\le j\le n$. $ P_{ij}$ and $ R_{ij}$ are defined as follows:

$\displaystyle P_{ij}=\frac{\mbox{Number of texts from cluster }i\mbox{ in class }j} {\mbox{Number of texts from cluster }i},$ (4)

and

$\displaystyle R_{ij}=\frac{\mbox{Number of texts from cluster }i\mbox{ in class }j} {\mbox{Number of texts in class }j}.$ (5)

The global performance of the clustering is calculated using the values of $ F_{ij}$. This measure is named $ F$ measure and it is shown as follows:

$\displaystyle F=\sum_{1\le i\le m}\frac{\vert G_i\vert}{\vert D\vert}\max_{1\le j\le n}F_{ij}.$ (6)



David Pinto 2006-05-25