WMT17 EN-DE Benchmark

vince62s · January 27, 2018, 5:12pm

In a recent paper (Sockeye a toolkit for NMT) some results were published for OpenNMT-Lua.

I would like to publish mine.

Settings:
Corpus: CommonCrawl, Europarl, NewscommentaryV12, Rapid2016
6 epochs, 2 layers of size 512, encoder BRNN, Embeddings 256.
47.7M parameters
Newstest2017: 23.41

In section 4.3.1 of the Sockeye paper they take a 92.4M parameter model to show 19.70 for OpenNMT-Lua [Sockeye 23.18 / Marian 23.54 / Nematus 23.86]
Their setup: 20 epochs !! 1 layer 1000 / embeddings 500

Of course I am not using exactly their setup but the presentation is definitely misleading.

I will post more runs in this thread.

NB: we use an in-house very strong cleaning process which leads to retain only 4.1 M segments out of 5.5 M. This should not have a major impact, but just to outline that we used less data.

vince62s · January 28, 2018, 2:03pm

Second run.
2 layers of 1024, embeddings 256.
100.8M parameters
6 epochs (9 hours per epoch)
Newstest2017: 24.94

tel34 · January 28, 2018, 3:47pm

Interesting. I wonder what score you would get if you doubled the number of epochs.

vince62s · January 28, 2018, 3:50pm

Third run.
Same with embeddings 512, 121M Parameters.
Even though same ppl as previous run, Newstest2017: 24.67

@tel34 My point was just to make sure results reported by a few other papers were erroneous,
not to get the highest score possible, but indeed it’s already very competitive with published WMT
results without backtranslation.

vince62s · January 29, 2018, 7:47am

Fourth run.
Slightly closer to the first exemple of the paper.
1 layer 1024, embeddings 512. 95.9M parameters.
For some reason I had to start the LR at 0.7 otherwise it diverged.
Newstest2017: 23.78

duyvuleo · February 1, 2018, 12:03am

Hi Vincent,

Just a bit curious about your BLEU scores whether they are tokenised and case-sensitive? Thanks!

vince62s · February 1, 2018, 7:46am

always cased NIST Bleu with mteval-13a.pl.
so I detokenize, and the tokenization is the one embedded in the NIST script.

vince62s · March 6, 2018, 2:11pm

New pre-trained models with better results are now posted on the website.
http://opennmt.net/Models/