I have 126 million pair sentences, i have 8 GPUs , using batch size of 128 with batch type = sents , also using accum count = 8. If i want to calculate how much of training data have been processed, how should i do it ? Like what i understand is: every 8 batches a single step (optimizer step for weights) is taken as accum count = 8, and the batch size for each single GPU is 128 which is 1024 sentences per 1 batch, so does this mean each 8192 sentences an update is made to the model ? Does this mean at nearly 15.5k steps the whole training data is processed ?
I would highly appreciate any help
why do you want to use accum_count ?
usually we use it to have big batches on 1 GPU machines.
Since you have an 8 GPU machine don’t bother.