I see that epochs are deprecated but it’s still interesting for me to think about how many “complete passes” over the training data have been performed. And I’m having trouble from the documentation figuring out how to calculate this because I don’t fully understand what “accumulation” is.
If I run 1,000,000 examples on 1 GPU with
accum_count = 2 and
batch_size = 4096, then is 1 epoch = (1,000,000) / (4096 x 2) = 122 steps? Are there other factors which can affect the step vs epoch relationship?