WMT17 EN-DE Benchmark

In a recent paper (Sockeye a toolkit for NMT) some results were published for OpenNMT-Lua.

I would like to publish mine.

Settings:
Corpus: CommonCrawl, Europarl, NewscommentaryV12, Rapid2016
6 epochs, 2 layers of size 512, encoder BRNN, Embeddings 256.
47.7M parameters
Newstest2017: 23.41

In section 4.3.1 of the Sockeye paper they take a 92.4M parameter model to show 19.70 for OpenNMT-Lua [Sockeye 23.18 / Marian 23.54 / Nematus 23.86]
Their setup: 20 epochs !! 1 layer 1000 / embeddings 500

Of course I am not using exactly their setup but the presentation is definitely misleading.

I will post more runs in this thread.

NB: we use an in-house very strong cleaning process which leads to retain only 4.1 M segments out of 5.5 M. This should not have a major impact, but just to outline that we used less data.

1 Like

Second run.
2 layers of 1024, embeddings 256.
100.8M parameters
6 epochs (9 hours per epoch)
Newstest2017: 24.94

Interesting. I wonder what score you would get if you doubled the number of epochs.

Third run.
Same with embeddings 512, 121M Parameters.
Even though same ppl as previous run, Newstest2017: 24.67

@tel34 My point was just to make sure results reported by a few other papers were erroneous,
not to get the highest score possible, but indeed it’s already very competitive with published WMT
results without backtranslation.

Fourth run.
Slightly closer to the first exemple of the paper.
1 layer 1024, embeddings 512. 95.9M parameters.
For some reason I had to start the LR at 0.7 otherwise it diverged.
Newstest2017: 23.78

Hi Vincent,

Just a bit curious about your BLEU scores whether they are tokenised and case-sensitive? Thanks!

always cased NIST Bleu with mteval-13a.pl.
so I detokenize, and the tokenization is the one embedded in the NIST script.

New pre-trained models with better results are now posted on the website.
http://opennmt.net/Models/

1 Like