Sign of overfitting/parameters/regularizations

Hello there,

I have been playing around with openNMT-py and for languages where I used less than 130k sentences the default parameters from openNMT seems to do the trick, but over 130k ish I believe it start overffiting.

  1. I’ve searched the forum and I’m not sure I understood exactly how to determine when it’s overffiting. My understanding is that when perplexity and accuracy stop improving and “flatten”.
    Which is exactly what is happening in my case, when I use a dataset with 400k sentences. I get really good results up till 130k (acc: 85.08; ppl: 1.76; xent: 0.57; lr: 0.00195), but then it stop improving and even slightly decrease and then reincrease every 20k sentences.

  2. I tried to set the learning_rate to 0.1 and learning_rate_decay to 0.9, but I get even worse results. At step 210k the results are not increasing anymore and looks like this: (acc: 85.08; ppl: 1.76; xent: 0.57; lr: 0.00195) If you would have any suggestion on what parameter to use it would be welcome!

  3. I was planning to try using regularizations, but I’m not sure how these works. Do i first need to train until perplexity and accuracy reach a plateau and then regenerate my yaml file with dropout option and start training again?

Thank you!

I have been playing around with openNMT-py and for languages where I used less than 130k sentences the default parameters from openNMT seems to do the trick, but over 130k ish I believe it start overffiting.

Overfitting happens when training a big model on not enough data. Basically, the model will learn the training examples “by heart” and won’t be able to generalize properly on other examples.
What you describe is the other way around.

I’ve searched the forum and I’m not sure I understood exactly how to determine when it’s overffiting. My understanding is that when perplexity and accuracy stop improving and “flatten”.
Which is exactly what is happening in my case, when I use a dataset with 400k sentences. I get really good results up till 130k (acc: 85.08; ppl: 1.76; xent: 0.57; lr: 0.00195), but then it stop improving and even slightly decrease and then reincrease every 20k sentences.

I think you’re confusing steps and sentences. If so, then it may just be that your dataset is not properly shuffled, and hits some parts that hurts performance (e.g. too far from the original domain) around 130k steps.

I tried to set the learning_rate to 0.1 and learning_rate_decay to 0.9, but I get even worse results. At step 210k the results are not increasing anymore and looks like this: (acc: 85.08; ppl: 1.76; xent: 0.57; lr: 0.00195) If you would have any suggestion on what parameter to use it would be welcome!

If you want any chance to get some specific advice like that, you need to post your full config.

I was planning to try using regularizations, but I’m not sure how these works. Do i first need to train until perplexity and accuracy reach a plateau and then regenerate my yaml file with dropout option and start training again?

Have a look at dropout and label smoothing for instance. Increasing the batch size (through gradient accumulation // accum_count/steps) might be of some help as well.

Also, you could try to add some back-translations.