Next: Experimental Results
Up: A Penalisation-Based Ranking Approach
Previous: Dataset Preprocessing
The Penalisation-Based Ranking Approach
Nowadays, different information retrieval models are reported in literature [4] [2]. The most popular is the vector space model, however, in practice this model is not viable. In this work, we have used a variation of the boolean model with ranking based in the Jaccard similarity formula. We named this variation ``Jaccard with penalisation'', because it punishes the ranking score taking into account the number of terms that a query really matches when it is compared with a document of the collection. The formula used is presented as follows:
As can be seen, the first component of this formula is the typical Jaccard approximation. The evaluation of this formula is quite fast, and allows its implementation in real situations. The obtained results by using this approach are presented in the next section.
David Pinto
2007-05-08