Computing BLEU is a very frequent question. BLEU is simply a measure for evaluating the quality of your Machine Translation system.
Among the popular methods to calculate BLEU are the multi-bleu.perl script and sacreBLEU. They work very similarly although results might be slightly different; for example, in one of my tests, the score reported by sacreBLEU was 48.23 while the BLEU score reported by multi-bleu.perl was 48.57.
Important Note: Both the Multi-BLEU script and sacreBLEU work on detokenize text.
To use multi-bleu.perl, you can simply run this command line in your Terminal.
perl multi-bleu.perl human-translation.txt < mt-pred.txt
To compute BLEU using sacreBLEU instead, I have wrote a detailed tutorial and scripts, and I hope you find them useful: https://blog.machinetranslation.io/compute-bleu-score/
Kind regards,
Yasmin