next up previous
Next: Bibliography Up: Vocabulary Reduction and Text Previous: Results

Conclusions

We have proposed an index reduction method for cross language search engines, which includes an enrichment step. Our proposal is based on the transition point technique which allows to index only the mid-frequency terms from every document. Our method is linear in computational time and, therefore, it can be used in a wide spectrum of practical tasks.

After submitting our run we observed enhancement if we compare the results obtained with those of the BiEnEs task in WebCLEF 2005. By using the enrichment, more than 40% on MRR was achieved. However, by using the Vector Space Model similar results to boolean model were obtained.

The TP technique has shown an effective use on diverse areas of NLP, and its best features for NLP, are mainly two: a high content of semantic information and the sparseness that can be obtained on vectors for document representation on models based on the vector space model. On the other hand, its language independence allows to use this technique in multilingual environments.

We consider that our approach may be improved by taking into account all the terms of the vocabulary in the enrichement process. Once the term expansion would be done, the mid-frequency selection technique could be applied. Further analysis will investigate this issue.


next up previous
Next: Bibliography Up: Vocabulary Reduction and Text Previous: Results
David Pinto 2007-05-08