Simple combination between SMT and NMT

A tool like Moses brings several interesting things:

  • a large vocab
  • the ability to do some kinds of contextual disambiguations on words or groups of words
  • the ability to bring the sub-sentence alignment of its translation

Here is a FR COOKING sentence:

Les pâtisseries orientales nous font voyager en quelques bouchées, au Maghreb et Moyen-Orient.

For the demonstration, here is the ONMT translation obtained with a off-domain COMPUTING model:

Orientales pâtisseries are making us travel in a few bouchées , in the Maghreb and in the Middle East .

Here is the Moses translation obtained with the same kind of off-domain COMPUTING model:

Pastries eastern us travel in a few jammed, the Maghreb and the Middle East.

Here is the alignment provided by Moses, where I only kept the longer word for each group of words (of course, it’s a naive test, it would have been much more efficient to properly remove empty words and poncts):

AL : les pâtisseries => pastries / pâtisseries => pastries
AL : orientales => eastern / orientales => eastern
AL : nous font => us / nous => us
AL : voyager => travel / voyager => travel
AL : en quelques => in a few / quelques => few
AL : bouchées => jammed / bouchées => jammed
AL : , au Maghreb et => , the Maghreb and / Maghreb => Maghreb
AL : Moyen-Orient . => the Middle East . / Moyen-Orient => Middle

This provides me with these possible finalizations of the ONMT translation (replacement of untranslated words):

Finalize : Orientales => eastern
Finalize : pâtisseries => pastries
Finalize : bouchées => jammed
Finalize : Maghreb => Maghreb

I finally get this NMT/SMT mixed translation:

Eastern pastries are making us travel in a few jammed, in the Maghreb and in the Middle East.


PS : COMPUTING models were obtained by mixing in-domain sentences with 2M of Europarl.
PS : of course, the final result is still not perfect.