Hello! I’d be grateful if someone could give answers or some explanation to the following two questions:
- how increasing the parameter
effective batch size
affects the final result of a trained model (it gets worse, better or does not affect at all)? Currently usingeffective batch size = 25 000
- if I increase
effective batch size
to 80 000 and train model for 65 000 steps will it be approximately equivalently to training model witheffective batch size = 25 000
and train steps 200 000?
Many thanks in advance!