Abnormal perplexity results on Chinese dialogue dataset using transformer

Hi, I’m now training a 3 layer transformer on a Chinese dialogue dataset.
But I got much lower perplexity than the original paper (https://arxiv.org/abs/1911.04700).
The original perplexity is 43 on the random test dataset, and 89 on the biased test dataset, I got 27 on the random test dataset and 29 on the biased test dataset.

Moreover, the generated responses are very terrible, which are not fluency and repetition.

Additionally, I used char-level vocabulary.

I don’t know where is wrong, could someone help me?

1 Like