Looking to validate my trained datasets. I have looked into BLEU but it seems like I have to pay for it. Are there any better alternatives around?
None is perfect, but here few of them:
Personally, I like to look at BLEU and WER score.
WER stand for Word rate error and contrary to the other metrics it’s not using ngrams, so it does bring a different insights.
I would suggest having a look at this:
Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust
I personally use chrF++ and COMET. Additionally BLEU for comparison with other methods.
If i can had some insight… Depending what you are trying to achieve you might choose a different Score.
If you’r trying to see if a certain translation match the style of your current text and not so much focus on the meaning. BLEU Score will be better in that case.
If you want to create a model which is generic and not custom to a specific translator style then BLEU is for sure not the best way to go.