Online Learning

ymoslem · April 19, 2021, 3:00pm

Hello!

I am going to conduct an Online Adaptation experiment for updating a model, segment by segment on-the-fly. I am starting with the basic idea of continuing training a current model with a batch_size = 1

I will highly appreciate guidance on whether this approach was tested before with OpenNMT and/or whether there are factors to take into consideration during the experiment. Many thanks!

Kind regards,
Yasmin

Nart · April 19, 2021, 6:03pm

Dear Yasmin,
This sounds interesting, could you point out to a resource to learn about this?
Regards,
Nart.

ymoslem · April 21, 2021, 9:07am

Dear Nart,

See this paper for example. Also, this thesis gives some context.

If I understand correctly, @francoishernandez once said that training on a very small number of sentences is not fully supported by OpenNMT. However, I still believe that incremental training for a sentence after a sentence can still affect the model (after some time), if the right parameters are used.

My note about batch_size = 1 might not be exactly accurate as you are training on one example anyhow; however, you got the idea.

Kind regards,
Yasmin

francoishernandez · April 21, 2021, 9:09am

Yes it can. But it’s not without risk, such as catastrophic forgetting.

ymoslem · April 28, 2021, 5:48am

Hello!

Alternatively, an Adaptive Machine Translation approach (adopted by ModernMT for example) can be used; it is elaborated on this paper:

Basically, the steps are as follows:

Given a source input q (this can range from a single translation unit to an entire document), extract from the dataset/TM the top (source, target) pairs in terms of similarity between the source and q.
Use the retrieved pairs to fine-tune the baseline model, which is then applied to translate q.
After the translator edits the MT translation and approves it, add it to the dataset/TM.
Reset the adapted model to the original parameters, translate the next input source, and so on.

This approach can be less risky than updating the model segment by segment, as resetting the model (#4) can help avoid “catastrophic forgetting” that could happen due to fine-tuning the baseline model on few sentences.

@Nart @prashanth @francoishernandez

Kind regards,
Yasmin

Nart · April 29, 2021, 9:08pm

@ymoslem Thanks for sharing