Next: The EuroGOV corpus Up: Using Query-Relevant Documents Pairs Previous: The QRDP probabilistic model

Maximum likehood estimation

It is not difficult to derive an Expectation-Maximisation (EM) algorithm to perform maximum likelihood estimation of the statistical dictionary with respect to a collection of training samples $(X,Y)=\{(x_1,y_1),\dotsc,(x_N,y_N)\}$ . The (incomplete) log-likelihood function is:

$\displaystyle L(\ensuremath{\boldsymbol{\Theta}})=\sum_{n=1}^N \log\sum_{a_n} p(y_n,a_n\vert x_n)$

(10)

with

$\displaystyle p(y_n,a_n\vert x_n)\!=\!\frac{1}{(\vert x_n\vert+1)^{\vert y_n\ve... ...vert y_n\vert}\!\prod_{j=0}^{\vert x_n\vert} p(y_{ni}\,\vert\,x_{nj})^{a_{nij}}$

(11)

where, for convenience, the alignment variable, $a_{ni}\in\{0,1,\dotsc,\vert x_n\vert\}$ , has been rewritten as an indicator vector in (11), $a_{ni}=(a_{ni0}$ , $\dotsc$ , $a_{ni\vert x_n\vert})$ , with

in the query position to which it is connected, and zeros elsewhere.

The so-called complete version of the log-likelihood function (10) assumes that the hidden (missing) alignments $a_1,\dotsc,a_N$ are also known:

$\displaystyle \mathcal{L} (\ensuremath{\boldsymbol{\Theta}})=\sum_{n=1}^N \log p(y_n,a_n\vert x_n)$

(12)

The EM algorithm maximises (10) iteratively, through the application of two basic steps in each iteration: the E(xpectation) step and the M(aximisation) step. At iteration , the E step computes the expected value of (12) given the observed (incomplete) data, , and a current estimation of the parameters, $\ensuremath{\boldsymbol{\Theta}}^{(k)}$ . This reduces to the computation of the expected value of $a_{nij}$ :

$\displaystyle a_{nij}^{(k)} = \frac {p(y_{ni}\,\vert\,x_{nj})^{(k)}} {\sum_{j'}p(y_{ni}\,\vert\,x_{nj'})^{(k)}}$

(13)

Then, the M step finds a new estimate of $\ensuremath{\boldsymbol{\Theta}}$ , $\ensuremath{\boldsymbol{\Theta}}^{(k+1)}$ , by maximising (12), using (13) instead of the missing $a_{nji}$ . This results in:

$\displaystyle P(v\vert w)^{(k+1)} = \frac{\sum_n \sum_{i=1}^{\vert y_n\vert} \s... ..._{j=0}^{\vert x_n\vert} a_{nij}^{(k)} \,\delta(y_{ni}, w') \,\delta(x_{nj}, v)}$

(14)

$\displaystyle = \frac{\sum_n \frac{p(w\,\vert\,v)^{(k)}}{\sum_{j'}p(w\,\vert\,x... ...um_{j=0}^{\vert x_n\vert} \,\delta(y_{ni}, w') \,\delta({x_{nj}, v)} \right ] }$

(15)

for all $v \in \mathcal{X}$ and $w \in \mathcal{Y}$ ; where $\delta(a, b)$ is the Kronecker delta function; i.e., $\delta(a, b)=1$ if ; 0 otherwise.

An initial estimate for $\ensuremath{\boldsymbol{\Theta}}$ , $\ensuremath{\boldsymbol{\Theta}}^{(0)}$ , is required for the EM algorithm to start. This can be done by assuming that the translation probabilities are uniformly distributed; i.e.,

$\displaystyle p(w\,\vert\,v)^{(0)}=\frac{1}{\vert\mathcal{Y}\vert}$

(16)

for all $v \in \mathcal{X}$ and $w \in \mathcal{Y}$ .

Next: The EuroGOV corpus Up: Using Query-Relevant Documents Pairs Previous: The QRDP probabilistic model

David Pinto 2007-10-05