When computing the GOLD perplexity I get a score closer to the validation perplexity computed during training.
However, they are not the same and I would say they are rather different actually - the one is 7.27, the other is 10.19. I would expect some variation, but that is a bit too much I think. Is that normal? (The prediction perplexity is 2.14).