- To FULLY reproduce the TRAINING of the pre-trained model, WHICH SentencePiece parameters were used? I’ve got the SentencePiece model but I’d love to know how to do it MYSELF and get the same answer.
- When reproducing the BLEU (26 on news14 28 on news17 for the pre-trained) I presume the test.de & pred.de must be detokenized (back into no underscores)?
- The SentencePiece model was generated with this script: https://github.com/OpenNMT/OpenNMT-tf/blob/r1/scripts/wmt/prepare_data.sh. Look for “spm_train” to find the SentencePiece training parameters.
- Yes, BLEU is reported on detokenized output.
I did this more than 2 years ago.
If you’re doing this for an academic purpose it’s fine.
If you want a higher Bleu (> 32-33) you’ll need to use back translations.
Thanks @guillaumekln but why can’t I get the same BLEU score on the pre-trained onmt model EVEN AFTER detokenizing?
BLEU = 23.16, 51.6/29.0/17.5/11.1 (BP=0.998, ratio=0.998, hyp_len=52721, ref_len=52833)
not a BLEU of 26.
I’m comparing to test.de from wmt14.
Is that the same as news14??
Isn’t news14 in the training??
news14 is not in the training of course.
post your command line to compute your BLEU
My BLEU is the perl script provided with OpeNMT-py:
well not sure about what your files above are but the workflow is the following.
detokenized data => Tokenize with sentence piece => translate => tokenized output => detokenize output
preferably use multi-bleu-detok.perl on detokenized data to compare with papers.
if your test.de is detokenized then you need to detokenize your .pred file and use the other perl script.
hope this helps.
That’s what i’m doing BUT it looks like I’m using the wrong BLEU script . .
That was it.
I was using the wrong perl script.
Q. What would the non-detok BLEU perl have been doing??