I am training an NMT model using OpenNMT torch version. I would like to store the trained models per x number of iterations. Right now, it can store the models per x number of epochs. This is what my script looks like: th train.lua -data data/demo-train.t7 -save_model model -save_every 5000
. Any help?
-save_every 5000
will save a checkpoint every 5000 iterations. Does that work for you?
Is checkpoint the same as model?I would like to save the pre-trained models per 5000 iterations while the training is in progress. My data is huge, and I may not make it even to the first epoch. So, I wanted to save the models before any epoch is complete
Yes, there are the same.
In that case, the -save_every 5000
flag is not saving anything.
It should produce a rolling checkpoint that ends with _checkpoint.t7
.