EN-DE pretrained OpenNMT-Py model

pytorch

(Tomas V) #1

Hi,

I’m attempting to use the pretrained EN-DE model at the top of http://opennmt.net/Models-py/ .
While I can see something working with the DE-EN model on the same page, the same procedure does not work for the EN-DE model - translations are seemingly unrelated to the input.
For reference, I’m using the following commandline:
python3 translate.py -model averaged-10-epoch.pt -src ../models/en.txt -output pred.txt -verbose with a git checkout from today (2018-04-12). For example I get

SENT 1: ('The', 'worst', 'discovery', 'was', 'the', 'fact', 'that', 'in', 'one', 'tenth', 'of', 'households', 'people', 'do', 'not', 'have', 'breakfast', 'at', 'all', '.')
PRED 1: ▁Das ▁Ganze ▁ist ▁einzigartig .
PRED AVG SCORE: -2.5754, PRED PPL: 13.1367

Is there something obvious I am missing? I would appreciate if you had a hint for me.

Best regards

Thomas


(Guillaume Klein) #2

Hello,

Looks like the SentencePiece model required to tokenize the input text is missing from the model package. We will reupload it.

In the meantime, you can download the SentencePiece model separately and apply it on your test file. See:


(Tomas V) #3

Thank you, Guillaume!

Best regards

Thomas


(Tomas V) #4

This worked very well, thank you!
For those following this: The sentencepiece model is actually a bzip2-compressed tar even if the extension is tar.gz. tar xf <file> will do the right thing.

Best regards

Thomas


(Guillaume Klein) #5

Thanks for pointing this out. I fixed the archive type.

I also updated the main model package to contain the SentencePiece model.