Recently I have been doing a translation project. I have trained a en-zh model with about 50 million sentence pairs with a transformer model and the model does well for short sentences. But when facing long sentences the model tend to generate dull translations.
Is it because transformer model can’t handle long ( >20 words) sentence translation （maybe due to the degeneration nature of beam search), or because my training data length is too short? Is there any research about sentence length in translation models?
Thank you guys for the awesome project and awesome community!
By the way another question: Is there a way to generate translations with more diversity? Even with n_best setting, the top n translations generated by beam search are very similar. Should I try top k sampling?
It would be great if I can be kindly provided some good papers in relevant research areas.