OpenNMT-tf fine-tune an existing model

Hi.
I’m trying to fine-tune the model for domain adaptation.

My steps:

  1. Trained a general model on a dataset of 57 million rows - training data and 3,000 rows - validation data;
  2. Prepared datasets for automotive theme. 4,500 lines - training, 1,500 lines - validation;
  3. Created a new directory for training, put there config.yml and model.py files identical to the general training.
  4. Left dictionaries unchanged.
  5. Set the path to the checkpoint of the general model, launched a new training.

Result:

  1. BLEU grows very fast during training on validation data. In just 3 epochs, the training reaches 80 units and continues to grow to almost 100 units.
  2. The quality of the translation becomes significantly worse in comparison with the general model (-15 Bleu).

What am I doing wrong?
Is it right to fine-tune using only new data, or is it better to mix them with the original dataset and continue training?
Can you share general tips on fine-tuning models?

Thank you!

Hello!

Mixed fine-tuning is better (Chu et al., 2017). You don’t use all of the original data, just a portion of it, and you over-sample the in-domain data.

Your in-domain dataset is too small too. In a second experiment, you can try data augmentation techniques.

I have a tutorial here:

You can also check out our paper that used mixed fine-tuning and reported the results:

All the best,
Yasmin

1 Like