Next: Improving Transition Point approach:
Up: Experimental Results
Previous: Analysis of the unstability
An experiment was performed using the entire collection and applying
the three methods described in Section 3.
In this case, the noise words had a notably effect,
mainly in the TP method. Since TP method selects one term per time
for each text, a wrong selection may be crucial in the clustering task.
In some cases, this iterative process includes
words that change dramatically the composition of texts. Thus, a term
with very low DF value changes threshold used in the clustering task.
We tried to face this problem with an enrichment of terms selected by TP.
It is not possible to solve this task using related terms dictionaries like WordNet,
since the terminology of texts is very specialized (see ).
The problem was solved using -grams as an approximation to related words.
Behaviour of DF, TS and TPMI term selection methods.