EN-DE pretrained OpenNMT-Py model

tom · April 12, 2018, 2:08pm

Hi,

I’m attempting to use the pretrained EN-DE model at the top of http://opennmt.net/Models-py/ .
While I can see something working with the DE-EN model on the same page, the same procedure does not work for the EN-DE model - translations are seemingly unrelated to the input.
For reference, I’m using the following commandline:
python3 translate.py -model averaged-10-epoch.pt -src ../models/en.txt -output pred.txt -verbose with a git checkout from today (2018-04-12). For example I get

SENT 1: ('The', 'worst', 'discovery', 'was', 'the', 'fact', 'that', 'in', 'one', 'tenth', 'of', 'households', 'people', 'do', 'not', 'have', 'breakfast', 'at', 'all', '.')
PRED 1: ▁Das ▁Ganze ▁ist ▁einzigartig .
PRED AVG SCORE: -2.5754, PRED PPL: 13.1367

Is there something obvious I am missing? I would appreciate if you had a hint for me.

Best regards

Thomas

guillaumekln · April 13, 2018, 7:48am

Hello,

Looks like the SentencePiece model required to tokenize the input text is missing from the model package. We will reupload it.

In the meantime, you can download the SentencePiece model separately and apply it on your test file. See:

tom · April 14, 2018, 9:38am

Thank you, Guillaume!

Best regards

Thomas

tom · April 14, 2018, 1:44pm

This worked very well, thank you!
For those following this: The sentencepiece model is actually a bzip2-compressed tar even if the extension is tar.gz. tar xf <file> will do the right thing.

Best regards

Thomas

guillaumekln · April 16, 2018, 11:16am

Thanks for pointing this out. I fixed the archive type.

I also updated the main model package to contain the SentencePiece model.