OpenNMT-tf fine-tune an existing model

I’m trying to fine-tune the model for domain adaptation.

My steps:

  1. Trained a general model on a dataset of 57 million rows - training data and 3,000 rows - validation data;
  2. Prepared datasets for automotive theme. 4,500 lines - training, 1,500 lines - validation;
  3. Created a new directory for training, put there config.yml and files identical to the general training.
  4. Left dictionaries unchanged.
  5. Set the path to the checkpoint of the general model, launched a new training.


  1. BLEU grows very fast during training on validation data. In just 3 epochs, the training reaches 80 units and continues to grow to almost 100 units.
  2. The quality of the translation becomes significantly worse in comparison with the general model (-15 Bleu).

What am I doing wrong?
Is it right to fine-tune using only new data, or is it better to mix them with the original dataset and continue training?
Can you share general tips on fine-tuning models?

Thank you!


Mixed fine-tuning is better (Chu et al., 2017). You don’t use all of the original data, just a portion of it, and you over-sample the in-domain data.

Your in-domain dataset is too small too. In a second experiment, you can try data augmentation techniques.

I have a tutorial here:

You can also check out our paper that used mixed fine-tuning and reported the results:

All the best,

1 Like