I have a quick question regarding the meaning of the batch_size parameter with multiple GPUs in OpenNMT-py.
If the training is distributed across N GPUs (world_size N) is the true batch size equal to N * batch_size? Or, is the true batch size is equal to batch_size?
The documentation for the Lua version of OpenNMT makes this clear here, but I can’t find this mentioned in the OpenNMT-py documentation.
Thanks!