Save trained models per iteration

negacy · October 23, 2018, 4:15pm

I am training an NMT model using OpenNMT torch version. I would like to store the trained models per x number of iterations. Right now, it can store the models per x number of epochs. This is what my script looks like: th train.lua -data data/demo-train.t7 -save_model model -save_every 5000. Any help?

guillaumekln · October 23, 2018, 4:49pm

-save_every 5000 will save a checkpoint every 5000 iterations. Does that work for you?

negacy · October 23, 2018, 4:53pm

Is checkpoint the same as model?I would like to save the pre-trained models per 5000 iterations while the training is in progress. My data is huge, and I may not make it even to the first epoch. So, I wanted to save the models before any epoch is complete

guillaumekln · October 23, 2018, 4:56pm

Yes, there are the same.

negacy · October 23, 2018, 4:57pm

In that case, the -save_every 5000 flag is not saving anything.

guillaumekln · October 23, 2018, 5:00pm

It should produce a rolling checkpoint that ends with _checkpoint.t7.