Over fitting and over training

What is the recommended strategy for dealing with over training? Is there any functionality available to automatically stop training when performance on the validation data gets worse?

“early stopping” is what you are looking for. It is available in both OpenNMT-py and OpenNMT-tf.

In my experience, overfitting is rarely a concern when training standard Transformer models on a few million sentences.

