I was training a translation model, but due to a power failure system eventually restarted. The last checkpoint I have saved is on 90K training step. Is there any way to resume training from that same state??
If you train by OpenNMT-py
–train_from, -train_from
If training from a checkpoint then this is the path to the pretrained model’s state_dict.
Default: “”
Dear @park, Thanks for your reply. But I am getting an error “[Errno 2] No such file or directory: ‘sum_eng-model_90000.pt’”
My saved model is in the main directory, with other files like preprocessing.py, train.py are.(Default location where OpenNMT saves them).
I just added -train_from sum_eng-model_90000.pt to my command.
What correction should I made to make this work??
Many thanks in advance.
You need to specify your model PATH
-train_from YOUR/MODEL/PATH/ sum_eng-model_90000.pt
Please add the PATH
@park Sorry, but I am still not able to resolve it. I shifted sum_eng-model_90000.pt file in the data folder, with other trains, test, valid data files. Now I added -train_from data/sum_eng-model_90000.pt to my command, But I am still getting error file not found. Can you please help a bit.
sum_eng-model_90000.pt this is the name of file and available in the data folder.
What should I add to my command??
Please give me the full command
@park Here is the full command I am using
CUDA_VISIBLE_DEVICES=0 python train.py -src_word_vec_size 200 -tgt_word_vec_size 200 -data data/model -save_model sum_eng-model -save_checkpoint_steps 100 -world_size 1 -gpu_ranks 0 -batch_size 64 -valid_steps 10000 -train_steps 100000 -report_every 50 -train_from data/sum_eng-model_90000.pt
Try
CUDA_VISIBLE_DEVICES=0 python3 train.py -src_word_vec_size 200 -tgt_word_vec_size 200 -data data/model -save_model sum_eng-model -save_checkpoint_steps 100 -world_size 1 -gpu_ranks 0 -batch_size 64 -valid_steps 10000 -train_steps 100000 -report_every 50 --train_from data/sum_eng-model_90000.pt
@park still same error [Errno 2] No such file or directory: ‘data/sum_eng-model_90000.pt’.
File is there and file name is correct too.
try
python3 setup.py build
python3 setup.py install
@perk Thanks for your effort. This was a dependencies Issue.
Okay, Nice work