BLEU alternatives?

Looking to validate my trained datasets. I have looked into BLEU but it seems like I have to pay for it. Are there any better alternatives around?


None is perfect, but here few of them:


Personally, I like to look at BLEU and WER score.

WER stand for Word rate error and contrary to the other metrics it’s not using ngrams, so it does bring a different insights.

1 Like

I would suggest having a look at this:

Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust

I personally use chrF++ and COMET. Additionally BLEU for comparison with other methods.

1 Like

If i can had some insight… Depending what you are trying to achieve you might choose a different Score.

If you’r trying to see if a certain translation match the style of your current text and not so much focus on the meaning. BLEU Score will be better in that case.

If you want to create a model which is generic and not custom to a specific translator style then BLEU is for sure not the best way to go.