next up previous
Next: Results Up: Evaluation Previous: Corpus

Indexing reduction

In order to determine the behaviour of document indexing reduction on CLIRS, we submitted to the contest, a set of five runs, which are described as follows.
First Run: $\rightarrow$ Full:
This run used ``Full documents'' as evaluation corpus, and conformed the baseline for our experiments.
Second Run: $\rightarrow$ TP10:
This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of 10% around TP.
Third Run: $\rightarrow$ TP20:
This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of 20% around TP.
Fourth Run: $\rightarrow$ TP40:
This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of 40% around TP.
Fifth Run: $\rightarrow$ TP60:
This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of 60% around TP.

Table 1 shows the size of every evaluation corpus used, as well as the percentage of reduction obtained for each one. As can be seen, the TP technique obtained a big percentage of reduction (between 75 and 89%), which also implies a reduction in time for the indexing process in a CLIRS.


Table 1: Evaluation corpora
Corpus Size (Kb) % of Reduction
Full 117,345 0%
TP10 12,616 89.25%
TP20 19,660 83.25%
TP40 20,477 82.55%
TP60 28,903 75.37%


next up previous
Next: Results Up: Evaluation Previous: Corpus
David Pinto 2006-05-25