Hi,
I am struggling to find out what accum_steps are?
from my understanding… if you have 1 gpu you optimize after (batch_size * accum_count) samples in other words you work as if you are working with a batch of accum_count * mini-batches.
in the documentation it says that accum_steps are
" Steps at which accum_count values change"
why would we change the accum_count? and at what kind of step?
This is to set a specific schedule. E.g. first 5k steps with accum_count = 2, and then switch to accum_count = 4 would look like this: accum_count: [2, 4] accum_steps: [0, 5000]
why would we change the accum_count?
Nobody forces you to. It’s just a possibility. Some work use varying batch sizes and this is a way to replicate it.