I see the OpenNMT-tf supports back translation and lot many users are interested in this.
I’m slightly confused with the latest version what new commands and parameters I need to change to make it work, I did not see any clear example of how to go about doing this therefore the confusion.
Do I separately pass the monolingual data file, if not, what pre-processing I need to do append it to training data.
Any additional model parameters I need to set - I do see some config parameters like “freeze_layers”, “sampling_topk” , “decoding_noise” etc. available ?
All NMT systems support back translation as it just a comibination of training and translating: training a model on the reverse direction and translating a monolingual corpus to generate new corpus. Are you looking for a script that does this workflow for you? There is no such thing in OpenNMT-tf.
The parameters “sampling_topk” and “decoding_noise” are some translation parameters that were proposed in the following paper:
They can be used to improve back translation results but are not required.
No problem. I can run those steps, how do i get started. Below is my thinking -
Let’s say I’m building for English-Spanish. I have a lot of Spanish monolingual corpus. Do I just run inference on this monolingual corpus using a pre-trained Spanish-English model to get synthetic sentences. ? At what point in the process will I need to set the mentioned back-translation parameters. ?