Training English-German WMT15 NMT engine

Thanks a lot,it works now.:smiley:

1 Like

When I test accuracy of the model on newstest2013 dataset, the perplexity scores I see are different than what is reported above.

th translate.lua -model models/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 -src …/newstest/dev/newstest2013.en.tok -tgt …/newstest/dev/newstest2013.de.tok -output pred.txt -gpuid 1 | grep PPL
[02/08/17 01:27:08 INFO] PRED AVG SCORE: -0.4954, PRED PPL: 1.6412
[02/08/17 01:27:08 INFO] GOLD AVG SCORE: -2.2298, GOLD PPL: 9.2984

What is the difference between PRED vs GOLD PPL? Neither one seems to be the 7.19 score shown above.
Any ideas on what might be different/wrong with my setup which is leading to these different scores?
Thanks for the help.
-Ganesh

  • PRED PPL is the perplexity of the model’s own predictions
  • GOLD PPL is the perplexity of the gold data according to your model

As you ran the translation on the validation set, GOLD PPL should be 7.19. However, they are not computed the same way during the training and translation. I think the difference is mostly due to the padding that is taken into account during training but not during translation.

Thank you for the response. So just to be sure, I should be looking at GOLD PPL, not so much the PRED PPL for my model’s accuracy?
Thanks again for the explanation.
Ganesh

Yes but when you have gold data you usually care more about BLEU score. PRED PPL is not very useful because it is expected that the model has high confidence in its own predictions.

Does the translate.lua script (or any other script in ONMT) print the BLEU score given the gold data?

No, it does not. You just need to get the multi-bleu.perl script:

1 Like

Thanks - that is super helpful. I will take a look.

Thanks
Ganesh

Hello

I’ve followed thankful instructions in this tutorial, but have a problem.
(also added -mode aggressive option during tokenization to use open model parameters)

The problem I have is similar to what livenletdie had above.
I tried test to evaluate given open model by command bellow

th translate.lua -model onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 -src data/wmt15-de-en/newstest2013.en.tok -tgt data/wmt15-de-en/newstest2013.de.tok -output pred_new.txt -gpuid 3

What I’ve got in the end is:
[02/13/17 19:12:38 INFO] PRED AVG SCORE: -0.46, PRED PPL: 1.59
[02/13/17 19:12:38 INFO] GOLD AVG SCORE: -inf, GOLD PPL: inf

I presumed that something is wrong with the golden data “newstest2013.de” and retried the preprocess steps several times but could not solve the problem.
(I get bad scores on the golden data, sometimes over -100)

Are there any possible solutions?

Thanks in advance for your kind help.

Oh there was actually a small error when reporting the final score on the gold data. You may want to update the project and retry.

https://github.com/OpenNMT/OpenNMT/commit/af47fa34710eb35d98e5b162ff0917aafc1b3411

1 Like

It works now!
Thankyou very much.

4 posts were split to a new topic: Issues when running the English-German WMT15 training

Hi, how can I make the -valid_src/-valid_tgt resource used with preprocess.lua command? Are they required by the command? Can I just use a subset of the -train_src/-train_tgt resource as it? Thanks!

Yes.

Yes.

2000 sentences are enough for the validation set.

1 Like

I got it. Thanks a lot, Guillaume!

Hi,

I want to translate with bidrectional model I trained, so I add -config, but it showed unknown option? Does it mean I don’t need to feed in any config?

 th translate.lua -config ../en-de_gru_brnn_4.txt \
   -model ../wmt15-de-en/en-de_gru_brnn_4/wmt15-en-de-gru-brnn-4_epoch1_42.90.t7 \
   -src ../wmt15-de-en/test/newstest2014-deen-src.en.tok -output ../epoch1.txt -gpuid 1
**/home/lijun/torch/install/bin/luajit: ./onmt/utils/ExtendedCmdLine.lua:198: unkown option brnn**

I use brnn = true to train the model. Did I use wrong option to run?

Hi - you do not need to use -config for translation - the options are part of the model.
The error message says that -brnn is not a translate.lua option.

1 Like

Hi,

I tried to run bpe version with 2 layers, other settings are same as non-bpe version, but only got BLEU score on 18.69, wrt your report 19.34, it seems a big gap. Is there any point I should pay attention to?
Besides, I ran bi-lstm, and 4 layers, but only got 16.95 BLEU score, is it reasonable? I think 4 layers should be better than 2 layers (according to Google’s massive exploration on NMT).

Thanks very much.

maybe just provide the 2 command lines you used for your training.
that might to see what could lead to different reuslts.

For 4 layers bi-lism, I made a config file with “brnn = True” and “layers = 4”, then
$ th train.lua -config xxx -data xxx -save_model xxx -gpuid 1
For bpe version, I did as tutorial “Training Romance Multi-Way model”, then ran the same command line.

Could some one provide a deep model running recipe? Thanks a lot.