next up previous
Next: Experiments Up: Term Selection and Weighting Previous: Term Enrichment

Union of Entropy and TP

This representation takes advantage of the benefit of both approaches, TP and entropy. TP represents text independently, whereas entropy obtains better discriminant terms, therefore, we have selected those terms that satisfy either of these two conditions. The representation of a document $ D_i$ is then given by:

$\displaystyle H_i'=H_i\cup R_i$ (9)

In this approach two weighting criteria were adopted for the representation schema. Terms provided by $ H_i$ (Equation (5)) and $ R_i$ (Equation (6)) are weighted by Equations (1) and (7) (a modified version of (1)), respectively. The procedure for determining $ H_i'$ was to add all terms that satisfy $ H_i$ to the set $ R_i$. Thereafter, terms $ w_j\in H_i\cap R_i$ are weighted by Equation (7).



David Pinto 2007-05-08