Next: Maximum likehood estimation
Up: Using QueryRelevant Documents Pairs
Previous: Introduction
The QRDP probabilistic model
Lex be a query text in a certain input (source) language, and let
be a collection of web pages in a different output (target) language. Let
and
be their associated input and output vocabularies, respectively. Given a number
, we have to find the most relevant web pages with respect to the input query .
To do this, we have followed a probabilistic approach in which the most
relevant web pages are computed as those most probable given , i.e.,

(1) 
In the particular case of k=1, Equation (1) is simplified to

(2) 
In this work,
is modelled by using the wellknown IBM alignment model 1 (IBM1) for
statistical machine translation [6,11]. This model assumes that each word in the
web page is connected to exactly one word in the query. Also, it is assumed that the
query has an initial ``null'' word to
which words in the web page with no direct connexion are linked.
Formally, a hidden variable
is
introduced to reveal, for each position in the web page, the query
word position
to which it is
connected. Thus,

(3) 
where
denotes the set of all possible alignments
between and . The alignmentcompleted probability
can be decomposed in terms of individual, web page
positiondependent probabilities as:
In the case of the IBM1 model, it is assumed that
is uniformly distributed

(6) 
and that only depends on the query word to which it is connected

(7) 
By sustitution of (6) and (7) in (5); and thereafter
(5) in (3), we may write the IBM1 model as follows by some straighforward manipulations:
Note that this model is governed only by a statistical dictionary
={,
for all
and
}.
The model assumes that the order of the words in the query is not important. Therefore, each position in
a document is equally likely to be connected to each position in the query. Although this assumption is
unrealistic in machine translation, we consider the IBM1 model is particularly wellsuited for our approach.
Next: Maximum likehood estimation
Up: Using QueryRelevant Documents Pairs
Previous: Introduction
David Pinto
20071005