A5U7
(Ayush Utkarsh)
1
I have been noticing a reduction in size of new checkpoints when retraining on a new dataset from a saved (custom) pretrained version.
Pretrained weight:
Here the size reduces from 108M to just 36M.
Below is the train.py run with --train_from arg set to 108M file.
Also, while its besides the point, there’s no change in dictionary as I am using a char level model
Did you change the optimization method?
A5U7
(Ayush Utkarsh)
3
I have switched from Adam to sgd. Could that be the reason? Because the reduction is to 1/3rd of previous version.
Yes, that’s the reason. Adam comes with additional parameters for each model weight.
1 Like