Next: Results
Up: Evaluation
Previous: Corpus
In order to determine the behaviour of document indexing reduction on
CLIRS, we submitted to the contest, a set of five runs, which are described
as follows.
- First Run: Full:
- This run used ``Full documents'' as evaluation corpus, and conformed the baseline for our experiments.
- Second Run: TP10:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
10% around TP.
- Third Run: TP20:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
20% around TP.
- Fourth Run: TP40:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
40% around TP.
- Fifth Run: TP60:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
60% around TP.
Table 1 shows the size of every evaluation corpus used,
as well as the percentage of reduction obtained for each one. As can be seen,
the TP technique obtained a big percentage of reduction (between 75 and 89%),
which also implies a reduction in time for the indexing process in a CLIRS.
Table 1:
Evaluation corpora
Corpus |
Size (Kb) |
% of Reduction |
Full |
117,345 |
0% |
TP10 |
12,616 |
89.25% |
TP20 |
19,660 |
83.25% |
TP40 |
20,477 |
82.55% |
TP60 |
28,903 |
75.37% |
Next: Results
Up: Evaluation
Previous: Corpus
David Pinto
2006-05-25