@guillaumekln@francoishernandez
Hi everyone:
How do I work out how many times my sentences are being trained (or if even at least once)?
I really have no idea which of these parameters (below) to use to work that out.
Hey Paul,
The easiest is probably to check in your logs how many times each shard has been loaded at any point.
(Loading dataset from.... your_dataset.X.pt) --> X being the shard id.
If shard X has been loaded Y times, then what it contains has been seen Y times (or maybe Y-1 if you take into account the batch is still in queue).
Thnx.
OK, so that’s about 1000 sentences per training step in my runs on en-de WMT14.
And each sentence is seen about 40 times in 200K steps.
Interesting round number.
But it’s not in any of the parameters above!
I could play numerology . .
batch_size/accum_count = ~1000 . .
BTW This is all interesting.
I found the recall of the late part of my training set is 4 BLEU above the test set (vs the recall of the early part is no better than the test set).
Surprising given the data has been seen 39 other times!
But I guess it makes sense for the last data to have a larger memory effect in the parameters given each batch effectively partially erases previous training.