Utilization of Tensor Cores

Hello there, so recently I have began experimenting with transformer models in OpenNMT-py and I have the good fortune of having an RTX 3090.

In order to optimize training speed as much as possible, I was looking around and found mention of " * Mixed-precision training with APEX, optimized on Tensor Cores"

Currently the models I train already have a model_dtype of fp16 and I can’t find myself an option to utilize tensor cores so my question is, how do I use tensor cores in training with OpenNMT-py, if having fp16 doesn’t already do so.



Tensor Cores are automatically used, but you need to make sure the model dimensions are all multiple of 8.

Would this include Batch Size?

Yes, but this is done automatically:

Perfect, thank you