Next: Union of Entropy and
Up: Term Selection and Weighting
Previous: Transition Point
Although TP certainly reduces space dimensionality by increasing precision,
it obtains a low recall. Due to this fact we are proposing to enrich the terms
selected by this method with those which have similar characteristics, by using
a co-ocurrence bigrams-based formula. Formally, given a document made up of
only those terms selected by using the TP approach (), the new important
terms for will be obtained as follows:
or |
(8) |
That is, we only used a window of size one around each term of ,
and a minimum frequency of two for each bigram was required as condition to include new terms.
As , weighting for enriched terms follows Equations (1) and (7).
Terms
will use directly the Equation (1).
David Pinto
2007-05-08