I see that epochs are deprecated but it’s still interesting for me to think about how many “complete passes” over the training data have been performed. And I’m having trouble from the documentation figuring out how to calculate this because I don’t fully understand what “accumulation” is.
If I run 1,000,000 examples on 1 GPU with accum_count = 2
and batch_size = 4096
, then is 1 epoch = (1,000,000) / (4096 x 2) = 122 steps? Are there other factors which can affect the step vs epoch relationship?