Best Practices for Fine-Tuning OpenNMT Models with Limited Data

diforak501 · December 25, 2024, 12:17pm

Hi everyone,

I’ve been experimenting with OpenNMT for a translation project, and I’m facing some challenges with fine-tuning a pre-trained model using a relatively small dataset. My dataset consists of around 10,000 sentence pairs in a low-resource language, and I want to make sure I’m taking the right steps to achieve good performance without overfitting.

Here are a few specific questions I’m hoping the community can help with:

Batch Size and Learning Rate: What batch size and learning rate would you recommend when fine-tuning on such a small dataset? Should I start with the default values or adjust them based on dataset size?
Regularization Techniques: Are there specific regularization techniques (e.g., dropout) that work particularly well in scenarios with limited data?
Preprocessing: My data is already tokenized and cleaned, but would applying additional techniques like subword tokenization (e.g., BPE) offer noticeable improvements?
Evaluation Metrics: What’s the best way to monitor progress during fine-tuning? BLEU score? Loss? Or is there something else I should focus on?

I check this: https://forum.opennmt.net/t/fine-tune-opennmt-model-on-domai DevOpstraining But I have not found any solution. Could anyone guide me about this? If anyone has successfully fine-tuned models under similar conditions, I’d love to hear about your experiences and any tips you can share.

Thanks in advance for your help!

alexeir · April 23, 2025, 8:52am

Here are the answers:

For small dataset use small batches: 16-32, but you can experiment with smaller sizes (e.g., 8 or 4) if you encounter memory limitations or if your model overfits. A learning rate of 0.00001 to 0.001 could be a good starting point. I would suggest testing a few values (e.g., 0.00001, 0.00005, and 0.0001) and using a learning rate scheduler to adjust it dynamically during training.
A dropout rate of around 0.1–0.3 should work well. Also try weight decay (L2 regularization), typical value is 0.01 or 0.001
Use BPE, it will help to handle out-of-vocabulary especially for low-resource languages.
For evaluation metrics we use COMET only.