Hello! I’d be grateful if someone could give answers or some explanation to the following two questions:
- how increasing the parameter
effective batch sizeaffects the final result of a trained model (it gets worse, better or does not affect at all)? Currently usingeffective batch size = 25 000 - if I increase
effective batch sizeto 80 000 and train model for 65 000 steps will it be approximately equivalently to training model witheffective batch size = 25 000and train steps 200 000?
Many thanks in advance!