We have proposed a new approach for the ranking formula in an information retrieval system based on the Jaccard formula, but with a penalisation factor. After evaluating this approach in the approximately 75% of queries from the WebCLEF competition, we have obtained the third place in the overall results, among eight participant teams.

An evaluation of the use of diacritization in the task has shown that results are not significatively different, which may be suggesting that the set of queries provided for the evaluation does not have a high number of diacritics. Further investigation would determine whether this behaviour is realistic or must be tuned in further evaluations.

The comparison by using only the new topic set ranks the proposed system in fourth and second place for the overall results which uses both, the new topic set and the automatic generated new topics, respectively. We consider that the system proposed may be improved by taking into account a better understanding of the preprocessing phase.

