Next: Bibliography Up: Using Query-Relevant Documents Pairs Previous: Evaluation of the results

Conclusions

We have described a query-relevant document pairs based model for cross-language information retrieval. The QRDP model uses a statistical dictionary of associated words directly to rank documents according to their relevance with respect to the query. We consider that inaccuracies of query translation have a negative effect on document retrieval and, therefore, using the probabilistic values of association should help to overcome this problem.

The application of statistical machine translation for CLIR may be often seen in literature, but what we proposed in this paper is to study the derivation of the translation (association) dictionary from query-relevant document pairs. The probabilistic model assumes that the order of the words in the query is not important. Therefore, each position in a document is equally likely to be connected to each position in the query. Although this assumption is unrealistic in machine translation, we consider the IBM-1 model to be particularly well-suited for our approach.

We have used a term selection technique in order to reduce the size of the training corpus with good findings. For instance, by using a 82.5% of reduction, the results can improve those of using the complete corpus.

Last but not least, we would emphasize that the QRDP probabilistic model is language independent and, therefore, it can be employed to model cross-language query-document pairs in any language.

Next: Bibliography Up: Using Query-Relevant Documents Pairs Previous: Evaluation of the results

David Pinto 2007-10-05