Translate using pre-trained English -> German model in pyTorch

I am trying to regenerate results using pre-trained English -> German (WMT) model included in pyTorch documentation. When I use pre-trained model to translate test.en (A file included in WMT data archive obtained using the link given under ‘Corpus Prep’ column of English-German (WMT) table i.e https://s3.amazonaws.com/opennmt-trainingdata/wmt_ende_sp.tar.gz)

I see unsatisfactory resulsts on the test.de when I run following command:
python translate.py -model transformer-ende-wmt-pyOnmt/averaged-10-epoch.pt -src data/test.en -tgt data/test.de -verbose

SENT 1: ('▁28', '-', 'Y', 'ear', '-', 'O', 'ld', '▁Chef', '▁Found', '▁Dead', '▁at', '▁San', '▁Francisco', '▁Mal', 'l')
PRED 1: ▁28 - Jahr - O ld ▁Chef ▁Found ▁Dead
PRED SCORE: -7.7273
GOLD 1: ▁28 - jährige r ▁Koch ▁in ▁San ▁Francisco ▁Mal l ▁to t ▁auf gefunden
GOLD SCORE: -25.5761

SENT 2: ('▁A', '▁28', '-', 'year', '-', 'old', '▁chef', '▁who', '▁had', '▁recently', '▁moved', '▁to', '▁San', '▁Francisco', '▁was', '▁found', '▁dead', '▁in', '▁the', '▁sta', 'ir', 'well', '▁of', '▁a', '▁local', '▁mall', '▁this', '▁week', '.')
PRED 2: ▁Ein ▁28 - jährige r ▁Küchen chef , ▁der ▁vor ▁kurze m ▁nach ▁San ▁Francisco ▁ zog , ▁wurde ▁diese ▁Woche ▁im ▁Trepp en haus ▁eines ▁lokale n ▁Einkaufszentrum s ▁to t ▁auf gefunden .
PRED SCORE: -13.8766
GOLD 2: ▁Ein ▁28 - jährige r ▁Koch , ▁der ▁vor ▁kurze m ▁nach ▁San ▁Francisco ▁gezogen ▁ist , ▁wurde ▁im ▁Trepp en haus ▁eines ▁ örtlich en ▁Einkauf zentrum s ▁to t ▁auf gefunden .
GOLD SCORE: -34.4022

I have several question regarding the pretrained model:

1. I want to know on which data exactly the model was trained on and on which data the model was tested. I am assuming that train.de file is used for training given by the link under ‘Corpus Prep’ column of English -> German (WMT) pretrained model table. I am right about this? And is model tested on test.de file? if so what is the expected BLEU score for test.de?

3. Is file test.de and train.de contains the preprocessed data? or do I need to perform the preprocessing steps for feeding it to translate.py? If so, what are the required steps to perform preprocessing

4. I also want to run my model on WMT14 and WMT17 dataset to check if I achieve the BLEU score of 26.89 and 28.09 respectively. So, when testing on these datasets, do I need to perform any preprocessing steps?

5. I have seen in other post that using SentencePiece model for tokenization of test data increase the model accuracy. However, I am unable to find any documentation of how to use SentencePiece.model file to perform tokenization Can you please elaborate how to use this file? (I am using PyTorch implementation of OpenNMT)

6. In English -> German (WMT) pretrained model table, link under ‘Translation Parameters’ column points to the documentation with title ‘How do I use the Transformer model?’, which lists the training parameters but not the translation parameters. Is it correct link?

Thanks in advance!

1 Like

if you want to better understand the pipeline, we posted the full script in the tensorflow version here:

if you want to better understand the syntax of the translate command in onmt-py just look at the doc or readme.

you’ll see you have an incorrect syntax.

1 Like

I am also following the same pyTorch documentation

I ran the same code posted here
(python translate.py -model transformer-ende-wmt-pyOnmt/averaged-10-epoch.pt -src data/test.en -tgt data/test.de -verbose).

The resulting bleu code was :
BLEU = 33.31, 61.7/39.2/27.6/20.1 (BP=0.979, ratio=0.979, hyp_len=82004, ref_len=83752)
, which I got from running
(perl tools/multi-bleu.perl data/test.de < pred)

I didn’t change anything from the documentation - I just downleaded the prepared data and the pre-trained model from the same website (http://opennmt.net/Models-py/ , English->German)

But the bleu score seems weird.
Does anyone have idea or experience the same thing?

guys:
-tgt is when you give a gold target
use -output to direct the translation into an output file

BLEU is calculated on detokenized output. This model uses subwords tokenization, this is the reason why you see such a high BLEU score.

again , follow the scripts in the wmt folder of the onmt-tf version, you will understand.

1 Like