next up previous
Next: Dataset Preprocessing Up: A Penalisation-Based Ranking Approach Previous: A Penalisation-Based Ranking Approach

Introduction

In the last years we have been witnesses of a big explosion of information available in Internet. The correct classification of the mentioned information is the most important challenge for the information retrieval field. The fact that the information we are dealing with comes from all around the world and, therefore, from very different cultures with different languages, makes this task even more difficult. Moreover, the current commercial search engines, such as Google and Yahoo, provide only monolingual information retrieval, that is, given a query in a specific language, those systems retrieve query related documents which are written in the same language. In other words, current search engines do not consider the query language when these keywords are matched against the target document set.

Forums dedicated to the analysis of information search and retrieval, more particularly in a cross-language environment, are then needed. The WebCLEF concern is about the evaluation of information retrieval systems using cross-lingual web pages. The justification of the WebCLEF track is based on the fact that many issues for which people turn to the web are in essence multilingual. In 2005, the first edition of this competition was done in the framework of the Cross Language Evaluation Forum (CLEF) [1]. In this edition, we have participated in the mixed-monolingual task of the WebCLEF 2006 by using the EuroGOV corpus, which was compiled in 2005 before the WebCLEF campaign. This corpus consists in a crawl of governmental sites in Europe from approximately 27 differents Internet domains. A better description of this corpus can be found in [5] and, therefore, in the next section we will not describe the corpus, but the way we processed it in order to obtain the index terms. Section 3 explains the model we have implemented, whereas its evaluation is presented in Section 4. Finally, a discussion of our participation and the obtained results in this competition are given.


next up previous
Next: Dataset Preprocessing Up: A Penalisation-Based Ranking Approach Previous: A Penalisation-Based Ranking Approach
David Pinto 2007-05-08