Training Progress

ahmedbahaaeldin1 · December 26, 2021, 11:29pm

Hello,
I have 126 million pair sentences, i have 8 GPUs , using batch size of 128 with batch type = sents , also using accum count = 8. If i want to calculate how much of training data have been processed, how should i do it ? Like what i understand is: every 8 batches a single step (optimizer step for weights) is taken as accum count = 8, and the batch size for each single GPU is 128 which is 1024 sentences per 1 batch, so does this mean each 8192 sentences an update is made to the model ? Does this mean at nearly 15.5k steps the whole training data is processed ?
I would highly appreciate any help

vince62s · January 3, 2022, 4:49pm

why do you want to use accum_count ?

usually we use it to have big batches on 1 GPU machines.

Since you have an 8 GPU machine don’t bother.