Error when continuing a training

Etienne38 · January 19, 2017, 8:47am

I would like to continue a training with a new data set. Here is my command:
th train.lua -gpuid 1 -train_from "$dataPath/onmt-model_epoch20_6.32.t7" -continue -epochs 40 -data "$dataPath/onmt-train.t7" -save_model "$dataPath/onmt-model-R"

I get this error (at starting time):
/home/dev8/torch/install/bin/luajit: /home/dev8/torch/install/share/lua/5.1/nn/THNN.lua:110: weight tensor should be defined either for all 50004 classes or no classes but got weight tensor of shape: [33687] at /tmp/luarocks_cunn-scm-1-7938/cunn/lib/THCUNN/generic/ClassNLLCriterion.cu:44 stack traceback: [C]: in function 'v' /home/dev8/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput' ...ev8/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:44: in function 'updateOutput' ...ev8/torch/install/share/lua/5.1/nn/ParallelCriterion.lua:23: in function 'forward' ./onmt/modules/Decoder.lua:335: in function 'backward' ./onmt/utils/Memory.lua:37: in function 'optimize' train.lua:198: in function 'closure' ./onmt/utils/Parallel.lua:86: in function 'launch' train.lua:181: in function 'trainModel' train.lua:548: in function 'main' train.lua:551: in main chunk [C]: in function 'dofile' ...dev8/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Etienne38 · January 19, 2017, 9:13am

Just a remark…

The documentation here is documenting the “end_epoch” option, but it seems that it isn’t recognized. Instead, I found the recognized “epochs” option, that isn’t in the documentation:
http://opennmt.net/OpenNMT/details/train/

guillaumekln · January 19, 2017, 9:17am

You can’t change the vocabulary when continuing a training.

It is a recent change. You should update your code.

github.com

OpenNMT/OpenNMT/blob/master/CHANGELOG.md

## [Unreleased]

### Breaking changes

### New features

* Introduce hook mechanism for additional customization of workflows
* Sentence-level negative log-likelihood criterion for sequence tagging
* '-' stands for stdin for inference tools (translate, lm, tag)

### Fixes and improvements

* Fix beam-size 1 broken with lexical constraints

## [v0.9.7](https://github.com/OpenNMT/OpenNMT/releases/tag/v0.9.7) (2017-12-19)

### Fixes and improvements

* Fix detokenization when replaced target tokens contain spaces

This file has been truncated. show original

Etienne38 · January 19, 2017, 9:22am

Please, can you give me a link that explain how to properly re-launch a training on a new data set ?

Etienne38 · January 19, 2017, 9:26am

I think I have to use “-src_vocab” and “-tgt_vocab” in “preprocess.lua”…

guillaumekln · January 19, 2017, 9:32am

That is correct. Just reuse the .dict files generated with the initial dataset.