Multi language model


I have 2 questions:

  1. Is it possible with Opennmt to generate a model that ingest the same text translated in 3-4 languages and then translate it to a specific one?

The way I see it, is that the other languages would be optional, but bring additional information as to how to translate the text. I believe this would help a lot for context and feminine/masculine /plurals.

  1. I saw a presentation from Omniscient technology where they say low resources languages get benefits from training a model that translate to many languages at the same time. Which is really interesting, but once again I’m not sure if we can do that with opennmt?

If your interested in the video, have a look at this link and start watching at 41min 30 sec.
omniscien technology


You can find a discussion about this here:

The tutorial is out of date but the idea is still valid.

Since the task can be encoded in the data itself, the approach can work with any NMT training framework.

One of the presenters at the Omniscien webinar told me afterwards in a private communication that they had trained an “into-Tagalog” model using English source data and Indonesian/Spanish/Tagalog target data. He provided no further evidence and I have not followed this up.

This seems really promising for me, as I have the same content translated in so many languages (70).
For sure i’m going to give it a shot.

Is it fair to assume that I will need to use a much bigger model for this?

Terence, I will let you know my results with Tagalog, but based on the article referenced by Guillaume it seem only positive :slight_smile:

I guess it really boil down to the amount of work required to have that extra data, but in my case it’s somewhat already done!