Next: Results
Up: Evaluation
Previous: Corpus
In order to determine the behaviour of document indexing reduction on
CLIRS, we submitted to the contest, a set of five runs, which are described
as follows.
- First Run:
Full:
- This run used ``Full documents'' as evaluation corpus, and conformed the baseline for our experiments.
- Second Run:
TP10:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
10% around TP.
- Third Run:
TP20:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
20% around TP.
- Fourth Run:
TP40:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
40% around TP.
- Fifth Run:
TP60:
- This run used an evaluation corpus composed by the reduction of every document, using the TP technique with a neighbourhood of
60% around TP.
Table 1 shows the size of every evaluation corpus used,
as well as the percentage of reduction obtained for each one. As can be seen,
the TP technique obtained a big percentage of reduction (between 75 and 89%),
which also implies a reduction in time for the indexing process in a CLIRS.
Table 1:
Evaluation corpora
| Corpus |
Size (Kb) |
% of Reduction |
| Full |
117,345 |
0% |
| TP10 |
12,616 |
89.25% |
| TP20 |
19,660 |
83.25% |
| TP40 |
20,477 |
82.55% |
| TP60 |
28,903 |
75.37% |
Next: Results
Up: Evaluation
Previous: Corpus
David Pinto
2006-05-25