Increasing effective batch size

Dmitry · July 4, 2023, 10:11pm

Hello! I’d be grateful if someone could give answers or some explanation to the following two questions:

how increasing the parameter effective batch size affects the final result of a trained model (it gets worse, better or does not affect at all)? Currently using effective batch size = 25 000
if I increase effective batch size to 80 000 and train model for 65 000 steps will it be approximately equivalently to training model with effective batch size = 25 000 and train steps 200 000?
Many thanks in advance!

guillaumekln · July 17, 2023, 9:08am

Hi,

For Transformer training, increasing the effective batch size can improve the final results and/or converge faster. See for example https://arxiv.org/pdf/1806.00187.pdf which uses an effective batch size of 400k (even > 600k for the ENFR training).
No these are 2 different trainings. They will see approximately the same amount of data, but not under the same learning rate regime for example.

Dmitry · July 18, 2023, 12:56pm

Thank for your reply!

SamuelLacombe · July 20, 2023, 8:19pm

Hello,

To add to the question… what are the negatives impact of increasing the effective batch size?

Best regards,
Samuel

guillaumekln · July 21, 2023, 7:55am

I can’t think of any negative impacts.

However, for very large effective batch sizes a single step will take more time. So you might need to tune other parameters based on the step number: logging frequency, checkpoint saving frequency, etc.