Maximum likehood estimation

with

where, for convenience, the alignment variable, , has been rewritten as an indicator vector in (11), ,, , with in the query position to which it is connected, and zeros elsewhere.

The so-called *complete* version of the log-likelihood
function (10) assumes that the hidden (missing) alignments
are also known:

The EM algorithm maximises (10) iteratively, through the application of two basic steps in each iteration: the E(xpectation) step and the M(aximisation) step. At iteration , the E step computes the expected value of (12) given the observed (incomplete) data, , and a current estimation of the parameters, . This reduces to the computation of the expected value of :

Then, the M step finds a new estimate of , , by maximising (12), using (13) instead of the missing . This results in:

(14) |

for all and ; where is the Kronecker delta function; i.e., if ; 0 otherwise.

An initial estimate for , , is required for the EM algorithm to start. This can be done by assuming that the translation probabilities are uniformly distributed; i.e.,

(16) |

for all and .