Epoch and step relation

I am not sure which is the relation between steps and epoch. In concrete, how can I calculate how many times all training set was seen during the training?

I would like to make the validation at list 2 times during an epoch, but I don’t know how many steps should I set.

I fought that the relation was the following:
epoch = len(trainig_set)(/batch_size*steps)
It is correct?

Also, I saw this post Epochs Determination where the number of available GPUs influence to step number.

Regards, and thanks in advance


It’s not that easy when the batch size is expressed in terms of number of tokens (the default when training Transformer models).

If you want more control over the epochs, I suggest running the training epoch by epoch as demonstrated in the documentation: