next up previous
Next: Conclusions Up: Using Query-Relevant Documents Pairs Previous: The EuroGOV corpus

Evaluation of the results

In the experiments, we used the leave-one-out procedure which is a standard procedure in predicting the generalisation power of a classifier, both from a theoretical and empirical perspective [12].

Table 2 shows the results for every run executed by applying only 10 iterations in the EM algorithm. The first column indicates the name of the run carried out for each corpus. The last column shows the Mean Reciprocal Rank (MRR) obtained for each run. Additionally, the Average Success At (ASA) different number of documents retrieved is shown. As can be seen, an improvement by using an evaluation corpus was obtained employing the TP technique with a neighbourhood of 40%, which is exactly the same percentage used in other research works (see [10] and [13]). We consider that this improvement is derived from the elimination of noisy words, which helps to rank better the web pages.

Table 2: Evaluation results
Run 1 5 10 20 50 MRR
FULL 0.0000 0.0299 0.0970 0.2687 0.3955 0.0361
TP10 0.0149 0.0522 0.0672 0.0970 0.4030 0.0393
TP20 0.0149 0.0299 0.0448 0.0746 0.4030 0.0323
TP40 0.0149 0.0448 0.1045 0.1940 0.3881 0.0470
TP60 0.0000 0.0448 0.1269 0.2164 0.4030 0.0383

Three teams participated at the bilingual ``English to Spanish'' subtask at WebCLEF in 2005. Every team submitted at least one run [14,10,15]. A comparison among the results obtained by each team and our best results can be seen in Table 3. In this case, we are presenting the results obtained with the TP40 corpus and by applying 100 iterations in the EM algorithm. Each of these teams translated each query from English to Spanish and thereafter they used a traditional monolingual information retrieval system for carrying out the searching process. Particularly, the UNED team reported two results (UNED_FULL and UNED_BODY) which are related with the information of each web page used; their first aproximation makes use of information stored in html fields or tags identified during the preprocessing, like title, metadata, heading, body, outgoing links. Their second aproximation (UNED_BODY) only considered the information in the body field. We also considered only the information inside the body html tag and, therefore, the UNED_BODY run can be used for comparison. On the other hand, the ALICANTE's team has used a combination of three translation systems for obtaining the best translation of a query. Thereafter, they used a passage retrieval-based system as a search engine, indexing in the documents all the information except html tags.

Table 3: Comparison results over 134 topics
Run name 1 5 10 20 50 MRR
OurApproach 0.0672 0.1045 0.1418 0.2164 0.4403 0.0963
UNED_FULL 0.0821 0.1045 0.1194 0.1343 0.2090 0.0930
BUAP/UPV40 0.0597 0.0970 0.1119 0.1418 0.2164 0.0844
UNED_BODY 0.0224 0.0672 0.1045 0.1716 0.2612 0.0477
BUAP/UPVFull 0.0224 0.0672 0.1119 0.1418 0.1866 0.0465
ALICANTE 0.0299 0.0522 0.0597 0.0746 0.0970 0.0395

We may observe that by using the same information from a web page, we have slightly outperformed the results obtained by other approaches, even when we have trained our model with only 3 target web pages in average per query, and executing 100 iterations on the Expectation-Maximization model.

next up previous
Next: Conclusions Up: Using Query-Relevant Documents Pairs Previous: The EuroGOV corpus
David Pinto 2007-10-05