I am running some tests and trying to figure out something related to perplexity. During training, the validation set is used to validate each model and a perplexity score is generated.
During translation, also a score is generated (that’s the confidence score, according to the comments in the code). If I use the same model and the same set the validation perplexity score and the translation perplexity are different. Can someone say what perplexity refers to in both cases?
The validation perplexity is the perplexity of the validation’s true target data.
The translation perplexity is the perplexity of the model’s own prediction.
Yes, you can provide the target data during translation with the option -tgt. Then the GOLD perplexity should be comparable with the validation perplexity.
When computing the GOLD perplexity I get a score closer to the validation perplexity computed during training.
However, they are not the same and I would say they are rather different actually - the one is 7.27, the other is 10.19. I would expect some variation, but that is a bit too much I think. Is that normal? (The prediction perplexity is 2.14).
The preprocessing certainly dropped some validation sentences during training (due to length constraints) which are not filtered during translation. That’s the only additional difference I can think of right now.