Question about the relationship between train_steps and epochs

I see that epochs are deprecated but it’s still interesting for me to think about how many “complete passes” over the training data have been performed. And I’m having trouble from the documentation figuring out how to calculate this because I don’t fully understand what “accumulation” is.

If I run 1,000,000 examples on 1 GPU with accum_count = 2 and batch_size = 4096, then is 1 epoch = (1,000,000) / (4096 x 2) = 122 steps? Are there other factors which can affect the step vs epoch relationship?

1 Like

1M examples = 1M segments
4096 = 4096 tokens

I also confuse about this.
anyone can explain it more?