I did not find information in the forum about the shuffling of sentences during the training.
I fount this parameter but I am not sure how is it working:
# (optional) The number of elements from which to sample during shuffling (default: 500000).
# Set 0 or null to disable shuffling, -1 to match the number of training examples.
Could someone explain the behavior of this parameter?
the shuffling is done with all sentences or at batch level?
regards and thanks in advance
Thanks a lot for the reply
I don’t understand when the shuff is done. If I have 10M sentences, 500k of first sentences are taken, shuffed and the batches are taken from this 500k? Later, other 500k sentences are taken and shuffed?
So, if I have 10M sentences and the parameter is fixed to 10M, the shuff is done after each epoch?
In practice it’s not exactly the first 500k because when the shuffle buffer size is smaller than the dataset size, the training will split the dataset in 10M/500k=20 shards and visit them in a random order.
Yes. If you have enough memory, it’s best to set the buffer to the size of the training dataset so that you get an uniform shuffling.