OpenNMT Forum

Which BLEU script to use?

Reading through the suggestions in the forum, it seems like multi-bleu.perl is the common way to use as a tool to generate BLEU score, however, upon using it, a warning comes out that says:

It is not advisable to publish scores from multi-bleu.perl. The scores depend on your tokenizer, which is unlikely to be reproducible from your paper or consistent across research groups. Instead you should detokenize then use mteval-v14.pl, which has a standard tokenization. Scores from multi-bleu.perl can still be used for internal purposes when you have a consistent tokenizer.

What script/tool do you use in generating BLEU score if you want to publish a research work?

Best is to use SacreBleu.
But mteval-v14.pl = sacreBleu = multi-bleu-detok.perl
these are the reference ones used for WMT.
Unfortunately many papers published other Bleu in the past. Also at times with additional tricks like compound splitting or quote fixing.

Edit: also with these scripts there are various options for internal tokenization used by these scripts …