Makes sense… So to make I use everything at least once, I should set the sample size to:
a/.7 + b/.2 + c/.1
where a, b, and c are the sizes (in # of segments) of the respective sets.
I would rather say: Max(a/.7, b/.2, c/.1)
But, since I was a bit lost by the previous 3.5 factor…
Ah, I think you’re right.
no unfortunately - not implemented for the moment. It is more complicated since idx_files
can be not aligned, and it takes more work (several passes on files) to make an efficient implementation. Please open an issue on GitHub if you need it.
I can’t find this in the documentation. Is this the same as -decay_method reset
?
Yes this is the revised name.
Whew, thanks!
just to give some update on the previous run (with -max_batch_size 196
) - the 100 epochs training (so 300 000 000 sentences fed to the training), the final PPL is 42.59 for about 7 days training.
did you plot the ppl curve ? I am interested.
thks
What’s the hardware set-up for this?