Hello,
I am retraining relative transformer ab-ru model with back translation, but the BLEU score is still lower after 90k training steps.
Parallel corpus (sentences 100k + words 100k) gave me on a test data 20 BLEU score for ab-ru model.
I augmented the back translation of 640k sentences and the BLEU score is not climbing above 19 BLEU score after 90k training steps.
From another post, I found out that setting beam_width: 1 should help.
Is there anything else I should be aware of that would improve performance?
Thank you for the link!
Going through this article in section 5.3 Low resource vs. high resource setup, it seems that in my case, beam search(=5) is more effective than sampling. Greedy search was not included in that setting for some reason.
Eventually sampling will be needed moving forward, so my question is how to achieve the unrestricted sampling from the model distribution, and how to use that in OpenNMT?
Great!
These parameters are used during training, isnât that the case? so when I do inference later on, I would use different beam width, remove sampling and noise, right?
To achieve similar results to the paper, should these parameters be configured like this:
Random sampling is happening at inference when producing your backtranslations.
Basically it means that instead of greedy or beam search, tokens will be sampled randomly from the output distribution at each decoding step.
Itâs some putting some (additional) noise in your dataset if you prefer.
When it comes to adding noise to Back Translation, I would like to remind you, colleagues, of this paper, offering a simple approach, Tagged Back-Translation.