OpenNMT-py batch_size with multiple GPUs

I have a quick question regarding the meaning of the batch_size parameter with multiple GPUs in OpenNMT-py.

If the training is distributed across N GPUs (world_size N) is the true batch size equal to N * batch_size? Or, is the true batch size is equal to batch_size?

The documentation for the Lua version of OpenNMT makes this clear here, but I can’t find this mentioned in the OpenNMT-py documentation.

Thanks!

not sure what you want to call “true”, but here is how it works.

On each GPU you can accumulate N minibatch (N = accum_count) gradients
then you “gather” all GPU gradients.

“True” is accum_count x world_size x batchsize

1 Like

That is exactly what I was looking for. Thanks!