Training performance boost with OpenNMT-tf v2.17.0!

@guillaumekln ,
It seems some little magic happened with the latest version of OpenNMT-tf (v2.17.0) and I notice a ~20% throughput increase during training! Is it because of the switch to Keras mixed precision? Well done!

1 Like

Yes, this is mainly related to the mixed precision update. We also revised the loss definition to avoid rescaling the gradients later. This improves the performance a bit more.

Thanks for testing! Let me know if you find any issues related to these changes.

1 Like

Hi @guillaumekln ,

Now I’m facing some OOM issues with a new training. It seems I cannot affect the amount of the GPU memory used even when I reduce the batch_size and/or the sample_buffer_size.

In the previous successful training, I had left the batch_size to 2000 without defining sample_buffer_size. I had noticed that the shuffle buffer used a higher value than it used to (from 5000000 to ~6300000) and my GPU memory was squeezed in both GPUs, 11004 out of 11016MiB and 11008 out of 11019MiB, according to nvidia-smi. Still, training was successful with no OOM.

Now I’m training with almost the same corpus on the other language direction, but training fails at the evaluation step every time. I tried to lower the batch_size as low as 1586, I set sample_buffer_size to 5000000, but memory usage remains unchanged.

Both in the previous training and in the current one I’m training with mixed precision and horovod.

I did notice some related code changes you did in GitHub though, hopefully they address these issues too besides auto-tuning? :slight_smile:

Hi,

sample_buffer_size does not affect GPU memory, only CPU memory. Also, TensorFlow will always reserve all available GPU memory that is why you are not seeing any change when reducing the batch size.

Is it possible that you have a very long sentence in the evaluation file?

Indeed, I found a bug in my preprocessing pipeline that created huge lines. Thanks and sorry for the trouble.

1 Like