I am retraining relative transformer ab-ru model with back translation, but the BLEU score is still lower after 90k training steps.
Parallel corpus (sentences 100k + words 100k) gave me on a test data 20 BLEU score for ab-ru model.
I augmented the back translation of 640k sentences and the BLEU score is not climbing above 19 BLEU score after 90k training steps.
From another post, I found out that setting beam_width: 1 should help.
Is there anything else I should be aware of that would improve performance?