next up previous
Next: Experimental Results Up: A Penalisation-Based Ranking Approach Previous: Dataset Preprocessing

The Penalisation-Based Ranking Approach

Nowadays, different information retrieval models are reported in literature [4] [2]. The most popular is the vector space model, however, in practice this model is not viable. In this work, we have used a variation of the boolean model with ranking based in the Jaccard similarity formula. We named this variation ``Jaccard with penalisation'', because it punishes the ranking score taking into account the number of terms that a query $ Q_i$ really matches when it is compared with a document $ D_j$ of the collection. The formula used is presented as follows:

$\displaystyle Score(Q_i, D_j) = \frac{\vert D_j\vert \cap \vert Q_i\vert}{\vert...
... - \left( 1 - \frac{\vert D_j\vert \cap \vert Q_i\vert}{\vert Q_i\vert} \right)$

As can be seen, the first component of this formula is the typical Jaccard approximation. The evaluation of this formula is quite fast, and allows its implementation in real situations. The obtained results by using this approach are presented in the next section.

David Pinto 2007-05-08