next up previous
Next: The EuroGOV corpus Up: Using Query-Relevant Documents Pairs Previous: The QRDP probabilistic model


Maximum likehood estimation

It is not difficult to derive an Expectation-Maximisation (EM) algorithm to perform maximum likelihood estimation of the statistical dictionary with respect to a collection of training samples $ (X,Y)=\{(x_1,y_1),\dotsc,(x_N,y_N)\}$. The (incomplete) log-likelihood function is:

$\displaystyle L(\ensuremath{\boldsymbol{\Theta}})=\sum_{n=1}^N \log\sum_{a_n} p(y_n,a_n\vert x_n)$ (10)

with

$\displaystyle p(y_n,a_n\vert x_n)\!=\!\frac{1}{(\vert x_n\vert+1)^{\vert y_n\ve...
...vert y_n\vert}\!\prod_{j=0}^{\vert x_n\vert} p(y_{ni}\,\vert\,x_{nj})^{a_{nij}}$ (11)

where, for convenience, the alignment variable, $ a_{ni}\in\{0,1,\dotsc,\vert x_n\vert\}$, has been rewritten as an indicator vector in (11), $ a_{ni}=(a_{ni0}$,$ \dotsc$, $ a_{ni\vert x_n\vert})$, with $ 1$ in the query position to which it is connected, and zeros elsewhere.

The so-called complete version of the log-likelihood function (10) assumes that the hidden (missing) alignments $ a_1,\dotsc,a_N$ are also known:

$\displaystyle \mathcal{L} (\ensuremath{\boldsymbol{\Theta}})=\sum_{n=1}^N \log p(y_n,a_n\vert x_n)$ (12)

The EM algorithm maximises (10) iteratively, through the application of two basic steps in each iteration: the E(xpectation) step and the M(aximisation) step. At iteration $ k$, the E step computes the expected value of (12) given the observed (incomplete) data, $ (X,Y)$, and a current estimation of the parameters, $ \ensuremath{\boldsymbol{\Theta}}^{(k)}$. This reduces to the computation of the expected value of $ a_{nij}$:

$\displaystyle a_{nij}^{(k)} = \frac {p(y_{ni}\,\vert\,x_{nj})^{(k)}} {\sum_{j'}p(y_{ni}\,\vert\,x_{nj'})^{(k)}}$ (13)

Then, the M step finds a new estimate of $ \ensuremath{\boldsymbol{\Theta}}$, $ \ensuremath{\boldsymbol{\Theta}}^{(k+1)}$, by maximising (12), using (13) instead of the missing $ a_{nji}$. This results in:

$\displaystyle P(v\vert w)^{(k+1)} = \frac{\sum_n \sum_{i=1}^{\vert y_n\vert} \s...
..._{j=0}^{\vert x_n\vert} a_{nij}^{(k)} \,\delta(y_{ni}, w') \,\delta(x_{nj}, v)}$ (14)


$\displaystyle = \frac{\sum_n \frac{p(w\,\vert\,v)^{(k)}}{\sum_{j'}p(w\,\vert\,x...
...um_{j=0}^{\vert x_n\vert} \,\delta(y_{ni}, w') \,\delta({x_{nj}, v)} \right ] }$ (15)

for all $ v \in \mathcal{X}$ and $ w \in \mathcal{Y}$; where $ \delta(a, b)$ is the Kronecker delta function; i.e., $ \delta(a, b)=1$ if $ a=b$; 0 otherwise.

An initial estimate for $ \ensuremath{\boldsymbol{\Theta}}$, $ \ensuremath{\boldsymbol{\Theta}}^{(0)}$, is required for the EM algorithm to start. This can be done by assuming that the translation probabilities are uniformly distributed; i.e.,

$\displaystyle p(w\,\vert\,v)^{(0)}=\frac{1}{\vert\mathcal{Y}\vert}$ (16)

for all $ v \in \mathcal{X}$ and $ w \in \mathcal{Y}$.


next up previous
Next: The EuroGOV corpus Up: Using Query-Relevant Documents Pairs Previous: The QRDP probabilistic model
David Pinto 2007-10-05