Hello , Iam trying to compare Opennmt training speed to a customized script i wrote from scratch. How can i know that the Onmt script made a full pass through the whole training dataset ? If i have one corpus only , does this " 2021-12-19 04:17:46,034 INFO] Weighted corpora loaded so far: * corpus_1: 12" my corpus have been processed 12 times ?
my corpus have been processed 12 times ?
Yes and no. It means it has been loaded 12 times. But loaded does not mean processed. There is a pooling mechanism that loads the equivalent of N batches, sorts the examples by length, build batches, shuffle them and finally queue them for training.
The easiest metric you can use to compare things is either the wall time for a certain number of steps, or the
tok/sec speed shown in the logs.