Number of samples in one training step together with accumulation


how many samples would 1 training step cover together with accumulation?

For example if you see from log output 1/1000 training step and you have 2 gpus, 6 batchsize, 2 accum_count… then does that 1 training step include 2 x 6 x 2 samples? or does it only include 2 x 6 samples and you do not see that actual the optimization happening?

1 Like

ah alright that was also one question that the print out about loading data seemed off. Thanks!!!

I found this topic really interesting.
If a dataset has 4500 examples and we have a batch_size of 4096, 1 GPU and accum_count is 4. I guess in each steps the dataset is loaded 4 times. Is it correct?

batch_size of 4096 is probably tokens, not examples?

But if batch_type were to be examples, yes that would be the idea.

I see. Are examples taken until 4096 tokens are filled?