Error when continuing a training

I would like to continue a training with a new data set. Here is my command:
th train.lua -gpuid 1 -train_from "$dataPath/onmt-model_epoch20_6.32.t7" -continue -epochs 40 -data "$dataPath/onmt-train.t7" -save_model "$dataPath/onmt-model-R"

I get this error (at starting time):
/home/dev8/torch/install/bin/luajit: /home/dev8/torch/install/share/lua/5.1/nn/THNN.lua:110: weight tensor should be defined either for all 50004 classes or no classes but got weight tensor of shape: [33687] at /tmp/luarocks_cunn-scm-1-7938/cunn/lib/THCUNN/generic/ClassNLLCriterion.cu:44 stack traceback: [C]: in function 'v' /home/dev8/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput' ...ev8/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:44: in function 'updateOutput' ...ev8/torch/install/share/lua/5.1/nn/ParallelCriterion.lua:23: in function 'forward' ./onmt/modules/Decoder.lua:335: in function 'backward' ./onmt/utils/Memory.lua:37: in function 'optimize' train.lua:198: in function 'closure' ./onmt/utils/Parallel.lua:86: in function 'launch' train.lua:181: in function 'trainModel' train.lua:548: in function 'main' train.lua:551: in main chunk [C]: in function 'dofile' ...dev8/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

:confused:

Just a remark…

The documentation here is documenting the “end_epoch” option, but it seems that it isn’t recognized. Instead, I found the recognized “epochs” option, that isn’t in the documentation:
http://opennmt.net/OpenNMT/details/train/

:slight_smile:

You can’t change the vocabulary when continuing a training. :wink:

It is a recent change. You should update your code.

Please, can you give me a link that explain how to properly re-launch a training on a new data set ?

I think I have to use “-src_vocab” and “-tgt_vocab” in “preprocess.lua”…

That is correct. Just reuse the .dict files generated with the initial dataset.

1 Like