Correct me if I’m wrong, but there doesn’t seem to be an option to only save models after 10 epochs, for example. Only to save intermediate models, or to save a model every X models.

I would want to save model 11 to 15, but I don’t care for the first 10 models usually. I have a script that removes them now to save disk space (since they can get quite big), but it would be nice if it was a command line option. It’s a small thing though.

I think it would be too risky, because if the checkpoint gets corrupted or deleted for whatever reason before you have a saved epoch, you could loose countless hours of training. Such an option would make sense and also be safe if it always kept at least the last model + the current checkpoint.


It’s implemented in OpenNMT-py, though (option ‘start_checkpoint_at’).