Slow convergence on training a transformer model for Italian-English

kasuo46 · August 2, 2019, 9:23pm

Hi,

I am trying to train a transformer model for Italian-English using a parallel corpus. I am training on ~1M sentences and validating on ~50K sentences. I used the provided hyperparameters in the FAQ of the documentation. http://opennmt.net/OpenNMT-py/FAQ.html

I have trained it for 1 million steps but it still does not converge. Right now the validation acc is around 50.
I am trying to figure out how to make it converge faster. Any suggestion is appreciated. Thanks for your help!

I am attaching my training curve below:

ymoslem · August 4, 2019, 8:19am

Dear Alex,

One million segments can give good results if the dataset is consisted of relatively short sentences and if it is very specialized; otherwise, for longer sentences and/or a generic datasets, you need more data to get better results.

Plus, 50k for validating is so much; the standard is 5k maximum.

Kind regards,
Yasmin

kasuo46 · August 5, 2019, 5:35pm

Hi Yasmin,

Thanks for your suggestion! I appreciate it.

Alex