Multiple Targets

ymoslem · July 21, 2022, 6:41pm

Hello!

OpenNMT-tf supports multiple-sources. Any suggestions how to apply multiple targets? I want to have one input and generate two outputs, e.g. in two languages. I know that we could train a multilingual model. However, in my case the input might not be the exact wording, so I suppose if I generate two outputs independently, they might be different.

Thanks!
Yasmin

SamuelLacombe · July 21, 2022, 10:12pm

Hello,

Do you want the 2 output from the same call? Or it could be the same model and 2 calls?

Personally, I believe I would have a tag for the target language in the source and the result would be for that target specified in the tag.

Ex:

Hello <fr> ---> Bonjour
Hello <it> ---> Ciao

Never done it, but that’s what I would have tried first. If you output both at the same time, there might be a risk that the first language influence the output of the otherone.

ymoslem · July 21, 2022, 11:38pm

Thanks, Samuel!

The thing is that the input here might not be the exact translation while both outputs should have the same meaning. For example:

Great them <fr> ---> Bonjour
Great them <it> ---> Ciao

If the outputs are not interacting with each other in a certain way, there is a chance that the French output would not be a translation of the Italian output.

Great them <fr> ---> Comment ça va?
Great them <it> ---> Ciao

Maybe what I am trying to achieve here is more complicated than how it should be. The purpose of the discussion anyhow is to see how you and other colleagues would tackle such task.

Kind regards,
Yasmin

SamuelLacombe · July 22, 2022, 3:53am

Well, your just providing more context!

I’m working with about 70 languages, and the translators are all translating the same texts. I have 2-3 languages that are superfast and their translations are really accurate. I’m not sure if in your case it would be a similar pattern, but what I want to do eventually is explained in this post:

Multi language model

So there would be one language in the output, but I will leverage the other language translations in the source.

My though process on this one, is: if from English a certain word correspond to 10 different words in French, but you also have the same English sentence translated in Italian which the word in Italian is 1 to 1 with French. If you leverage your English + Italian, your accuracy should increase significantly.

Keeping the English would be important in this case as it’s the main reference. I would expect the Model to learn when to rely more on the main reference rater than the other language.

I don’t know if this can give you additional idea.

Best regards,
Samuel

anderleich · August 3, 2022, 11:20am

Hi @ymoslem ,

If I had to approach this task, I would start by just concatenating both outputs in the training data. I would also use a <SEP> token between them.

Great them ---> Bonjour <SEP> Ciao

The system would need to generate both sentences in the same decoding phase, and the second language sentence would be influenced by the first language sentence.

I’m just guessing. I’ve never done that in the target side. However, I think you might get something interesting.

Let me know if you manage to get some decent results