Warmup configuration for fine tuning

portia · December 7, 2019, 11:48am

Hi there,

I have been doing domain adaptation for some months with opennmt.
Now I realized I am using the same decay warmup configuration for the out-of-domain model and the in-domain model, so I would like to apply the best approach for the in-domain model.

What would you consider is the right warmup strategy for the in-domain training?

Thanks for your help

guillaumekln · December 8, 2019, 2:24pm

How exactly are you starting the training on the in-domain data?

Assuming your are training Transformer models with the automatic learning rate decay strategy (NoamDecay), it is generally enough to just continue the training on the new data without changing anything else.

portia · December 11, 2019, 1:20am

Thanks for your help Guillaume, it’s highly appreciated

elen · December 11, 2019, 11:26pm

If you start training a pertained model then it will resume training from the configuration it had stopped. This means that most likely you will be training the model with a very small learning rate. I would suggest to add a warmup period.

Take a look here on how to reset learning rate and warmup (and other hyper-parameters): https://github.com/OpenNMT/OpenNMT-py/issues/768