I improved transformer and put forward a new model. My model is obviously better than transformer in accuracy and ppl on verification set. On the English Vietnamese translation task, the best ppl and accuracy of transformer are 14 and 56 on the validation set during the training process, while corresponding scores of my model are 3 and 77. But when calculating bleu scores, the bleu score of my model than 10 on both validationset and testset, and the bleu scores of transformer is above 20.
What is the relationship between ppl, accuracy and bleu scores in machine translation? In my experiment, although ppl and accuracy on the validationset are greatly improved, Bleu scores is reduced.
It may be that the cross-entropy loss function is related to ppl and accuracy, but not to bleu?