Parameters inquiry: train_steps, queue_size, bucket_size

marwagaser · September 16, 2021, 5:22pm

I am new to neural MT and I am trying to understand the params used to create a transformer model. I understand that there is no longer an epoch parameter to set. However, the train_steps replaces that as per this discussion. What I understand that if I have a batch size of 4096 and I have 100k sentences, then if the train_steps are set to: 100k/4096 (25) that is 1 epoch and 74 train_steps that would perform 3 epochs. Am I right?
Also, what if I am training on 4 GPUs not 1, how shall the train_steps be changed if I want to do say 3 epochs.
And what if my batch type is token not sent, how to calculate the number of steps?
What does queue_size do? I read the docs and I still do not really get it
the bucket_size is for dynamic data, so if my data is not changing as I am training I do not need it?