I am going to conduct an Online Adaptation experiment for updating a model, segment by segment on-the-fly. I am starting with the basic idea of continuing training a current model with a batch_size = 1
I will highly appreciate guidance on whether this approach was tested before with OpenNMT and/or whether there are factors to take into consideration during the experiment. Many thanks!
See this paper for example. Also, this thesis gives some context.
If I understand correctly, @francoishernandez once said that training on a very small number of sentences is not fully supported by OpenNMT. However, I still believe that incremental training for a sentence after a sentence can still affect the model (after some time), if the right parameters are used.
My note about batch_size = 1 might not be exactly accurate as you are training on one example anyhow; however, you got the idea.
Alternatively, an Adaptive Machine Translation approach (adopted by ModernMT for example) can be used; it is elaborated on this paper:
Basically, the steps are as follows:
Given a source input q (this can range from a single translation unit to an entire document), extract from the dataset/TM the top (source, target) pairs in terms of similarity between the source and q.
Use the retrieved pairs to fine-tune the baseline model, which is then applied to translate q.
After the translator edits the MT translation and approves it, add it to the dataset/TM.
Reset the adapted model to the original parameters, translate the next input source, and so on.
This approach can be less risky than updating the model segment by segment, as resetting the model (#4) can help avoid “catastrophic forgetting” that could happen due to fine-tuning the baseline model on few sentences.