Hi, the training slows down as it progresses through the mini batches within an epoch. For e.g., 100 mini batches took <1 minute towards the beginning of training and when towards the end, 20 mini batches take >7 mins. Is this behavior common? (My dataset is large ~90 hrs speech and 20800 mini batches (BS=8))
I was using curriculum learning for the above exp. Is the “source tokens” value where I should be able to see increasing order of source tokens?
Also, I am trying to evaluate this experiment on the TER. My log still reports perplexity for training. Is that how it is?
Finally, in this experiment, after 1 epoch finished training, the evaluation on dev set entered some infinite loop and was using 100% CPU and 100% memory, not GPU. There are no error logs, so I’m not sure what led to this.
I realize I’ve included a lot of things in this post, but I request you to please answer all.