Slow convergence on training a transformer model for Italian-English


I am trying to train a transformer model for Italian-English using a parallel corpus. I am training on ~1M sentences and validating on ~50K sentences. I used the provided hyperparameters in the FAQ of the documentation.

I have trained it for 1 million steps but it still does not converge. Right now the validation acc is around 50.
I am trying to figure out how to make it converge faster. Any suggestion is appreciated. Thanks for your help!

I am attaching my training curve below:

Dear Alex,

One million segments can give good results if the dataset is consisted of relatively short sentences and if it is very specialized; otherwise, for longer sentences and/or a generic datasets, you need more data to get better results.

Plus, 50k for validating is so much; the standard is 5k maximum.

Kind regards,

Hi Yasmin,

Thanks for your suggestion! I appreciate it.