paulkp
(Paul Pallaghy)
March 31, 2020, 11:45pm
1
To FULLY reproduce the TRAINING of the pre-trained model, WHICH SentencePiece parameters were used? I’ve got the SentencePiece model but I’d love to know how to do it MYSELF and get the same answer.
When reproducing the BLEU (26 on news14 28 on news17 for the pre-trained) I presume the test.de & pred.de must be detokenized (back into no underscores)?
Thanks in advance everyone eg @guillaumekln @francoishernandez
vince62s
(Vincent Nguyen)
April 1, 2020, 4:04pm
3
Paul,
I did this more than 2 years ago.
If you’re doing this for an academic purpose it’s fine.
If you want a higher Bleu (> 32-33) you’ll need to use back translations.
Enjoy.
Vincent
1 Like
paulkp
(Paul Pallaghy)
April 8, 2020, 1:26am
4
Thanks @guillaumekln but why can’t I get the same BLEU score on the pre-trained onmt model EVEN AFTER detokenizing?
I get:
BLEU = 23.16, 51.6/29.0/17.5/11.1 (BP=0.998, ratio=0.998, hyp_len=52721, ref_len=52833)
not a BLEU of 26.
I’m comparing to test.de from wmt14.
Is that the same as news14??
Isn’t news14 in the training??
vince62s
(Vincent Nguyen)
April 9, 2020, 8:41am
5
news14 is not in the training of course.
post your command line to compute your BLEU
paulkp
(Paul Pallaghy)
April 15, 2020, 5:09am
6
@vince62s
My BLEU is the perl script provided with OpeNMT-py:
perl tools/multi-bleu.perl
/datadrive/wmt14-ende_sp/data/test.de
<
/datadrive/wmt14-ende_sp/preds/wmt14-ende_sp5_200K_ntok.pred
&>
/datadrive/wmt14-ende_sp/logs/wmt14-ende_sp5_200K_BLEU.log
Thoughts?
vince62s
(Vincent Nguyen)
April 15, 2020, 7:14am
7
well not sure about what your files above are but the workflow is the following.
detokenized data => Tokenize with sentence piece => translate => tokenized output => detokenize output
preferably use multi-bleu-detok.perl on detokenized data to compare with papers.
if your test.de is detokenized then you need to detokenize your .pred file and use the other perl script.
hope this helps.
1 Like
paulkp
(Paul Pallaghy)
April 15, 2020, 9:22am
8
@vince62s
That’s what i’m doing BUT it looks like I’m using the wrong BLEU script . .
paulkp
(Paul Pallaghy)
April 16, 2020, 12:59am
9
@vince62s
That was it.
I was using the wrong perl script.
Q. What would the non-detok BLEU perl have been doing??