What is really doing ONMT with the training data ? The log seems to report about a sorting by size. Are the sentences always trained ordered by size ?
This process comes from 2 constraints at the batch level:
- shuffling: sentences within a batch should come from different parts of the corpus
- sorting: sentences within a batch should have the same source length (i.e. without padding)
Then during the training, batches are randomly ordered so the whole corpus is seen in a random order. It just happens that sentences have the same source length within each batch.