How to Compute BLEU Score

Computing BLEU is a very frequent question. BLEU is simply a measure for evaluating the quality of your Machine Translation system.

Among the popular methods to calculate BLEU are the multi-bleu.perl script and sacreBLEU. They work very similarly although results might be slightly different; for example, in one of my tests, the score reported by sacreBLEU was 48.23 while the BLEU score reported by multi-bleu.perl was 48.57.

Important Note: Both the Multi-BLEU script and sacreBLEU work on detokenize text.

To use multi-bleu.perl, you can simply run this command line in your Terminal.

perl multi-bleu.perl human-translation.txt < mt-pred.txt

To compute BLEU using sacreBLEU instead, I have wrote a detailed tutorial and scripts, and I hope you find them useful: https://blog.machinetranslation.io/compute-bleu-score/

Kind regards,
Yasmin

2 Likes