Validation and Translation Perplexity


I am running some tests and trying to figure out something related to perplexity. During training, the validation set is used to validate each model and a perplexity score is generated.

During translation, also a score is generated (that’s the confidence score, according to the comments in the code). If I use the same model and the same set the validation perplexity score and the translation perplexity are different. Can someone say what perplexity refers to in both cases?



The validation perplexity is the perplexity of the validation’s true target data.
The translation perplexity is the perplexity of the model’s own prediction.

See also this topic:

thanks a lot for the prompt response.
Is there a way to compute the validation perplexity after the model is trained?

Thank you in advance.

Yes, you can provide the target data during translation with the option -tgt. Then the GOLD perplexity should be comparable with the validation perplexity.

Yes, I checked it in the documentation meanwhile… sorry about the question :).


When computing the GOLD perplexity I get a score closer to the validation perplexity computed during training.

However, they are not the same and I would say they are rather different actually - the one is 7.27, the other is 10.19. I would expect some variation, but that is a bit too much I think. Is that normal? (The prediction perplexity is 2.14).

Thank you,
Kind regards,

The preprocessing certainly dropped some validation sentences during training (due to length constraints) which are not filtered during translation. That’s the only additional difference I can think of right now.

OK, Interesting,

Thank you.

try increasing your vocab size and than give it a try.