Figure 1 shows values for every term selection method executed over different percentages of the collection's vocabulary (from 600 to 2,000 terms).

Given a percentage of the collection vocabulary, DF and TS methods selected the higher score terms. TP method selected terms in a local fashion; i.e. it takes a given number of terms from each text. Therefore, comparison among methods must be done through the vocabularies obtained in each selection of terms carried out by the methods. DF and TS methods used from 2% to 70% of the vocabulary terms. This range corresponds from 21 to 1,700 of the total terms in the collection. The TP selection method took from 5 to 30 terms from each text, given a similar range of total terms. In Fig. 1, the results of these three methods are shown; the horizontal axis represents the number of terms and the vertical axis the values (eq. 6). In order to apply TS method, similarity matrix was calculated as 3-tuples ( ) and sorted according , then was computed for all terms. Since only 1,349 terms were obtained, threshold was fixed to 0.

DF method was very stable but it did not help to the clustering task. From the beginning, DF included the most frequent terms in the texts, and this contributed to mantain a minimum level of similarity during the clustering task. Baseline, i.e. the clustering done without term selection (), indicates that DF selects terms to represent texts that mantain resemblance with the original ones. On the other hand, TS method reached the maximum value after 700 terms, and after 900 terms it obtained stability as well as the DF method did.

TP method outperformed the other two methods. The maximum value for TP method was 0.6415. This value was reached with a vocabulary size of 1,661 terms which corresponds to only 22 terms per text. The unstability of TP method is derived from noise words that are difficult to detect because of their low frequencies. Next subsection presents an analysis of the TP selection in order to control the unstability.