Multilingual pretrained model: help / weird output


(Fbigi) #1

Hi, I wanted to try the multilingual onmt_esfritptro-4-1000-600_epoch13_3.12_release_v2.t7, but I didn’t found a specific. I expect it to be a pre-trained net, that supports ES FR IT PT RO as both source and target languages. Am I wrong in my assumption?

If not, I have some questions:

  1. Is it possible, and if yes how, to choose the target language?
  2. The source language is autp-detected? If not, how do I select the source language?
  3. When translating a simple text it gives me back a weird text and I don’t understand why. In other words: this is my source text

“Charlene Bevre : ballerina famosa di New York
Sei una famosa ballerina di New York , la tua stella è in piena crescita e la tua notorietà sta valicando i confini dello stato . Incarni lo stereotipo di femme fatale ed hai imparato che puoi ottenere qualsiasi cosa dagli uomini se solo gli lasci intendere che sei disponibile.”

And this is my result:

“apport■ apport■ apport■ empr■ unt■ em■ em■ ing New York famig■ li■ em■ em■ ici■ ente New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista famig■ li■ ar■ ista New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista famig■ li■ ar■ ista New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista famig■ li■ ar■ ista New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista famig■ li■ ar■ ista New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista famig■ li■ ar■ ista New York famig■ li■ em■ em■ ici■ ente famig■ li■ ar■ ista famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■ ar■ ici■ ente famig■ li■ ar■ ista New York famig■ li■
Tu ești un sac■ er■ do■ te de New York City ■, tu sais que tu peux obtenir tout ce que tu conn■ ais de la femme ■, tu sais que tu peux obtenir tout ce que tu conn■ ais les hommes ■.”

I followed the quick start instruction issuing the following command:
th translate.lua -model onmt_esfritptro-4-1000-600_epoch13_3.12_release_v2.t7 -src src-test.txt -output pred.txt

Any help or suggestion will be greatly appreciated. Thanks a lot!
F.


(Guillaume Klein) #2

Hi,

All information regarding this model can be found here (including how to preprocess the data, add the language selection token, etc.):

The BPE model that is associated to this model is included in the preprocessed data package:

https://s3.amazonaws.com/opennmt-trainingdata/multi-esfritptro-parallel-tokenized.tgz