Different BLEU scores with test2015.de.atok test2016.de



I have some question regarding the example translation using pytorch.
When using the tool to calculate the BLEU score i get 37.12 when using the test2016.de.atok file. This score is way to high…
But then when using the test2016.de file I only score a 28.71 which seems far more likely.
When comparing the file, I only noticed a difference in the punctuation.
I would ejoy any kind of help.
