Reduction in size of checkpoints: `--train_from` arg

I have been noticing a reduction in size of new checkpoints when retraining on a new dataset from a saved (custom) pretrained version.

Pretrained weight:

Here the size reduces from 108M to just 36M.
Below is the run with --train_from arg set to 108M file.

Also, while its besides the point, there’s no change in dictionary as I am using a char level model

Did you change the optimization method?

I have switched from Adam to sgd. Could that be the reason? Because the reduction is to 1/3rd of previous version.

Yes, that’s the reason. Adam comes with additional parameters for each model weight.

1 Like