I have been playing around with openNMT-py and for languages where I used less than 130k sentences the default parameters from openNMT seems to do the trick, but over 130k ish I believe it start overffiting.
I’ve searched the forum and I’m not sure I understood exactly how to determine when it’s overffiting. My understanding is that when perplexity and accuracy stop improving and “flatten”.
Which is exactly what is happening in my case, when I use a dataset with 400k sentences. I get really good results up till 130k (acc: 85.08; ppl: 1.76; xent: 0.57; lr: 0.00195), but then it stop improving and even slightly decrease and then reincrease every 20k sentences.
I tried to set the learning_rate to 0.1 and learning_rate_decay to 0.9, but I get even worse results. At step 210k the results are not increasing anymore and looks like this: (acc: 85.08; ppl: 1.76; xent: 0.57; lr: 0.00195) If you would have any suggestion on what parameter to use it would be welcome!
I was planning to try using regularizations, but I’m not sure how these works. Do i first need to train until perplexity and accuracy reach a plateau and then regenerate my yaml file with dropout option and start training again?