Retraining with new vocabulary: fine-tuning


(Zuzanna Parcheta) #1

I trained a model with some vocabulary. Now I want to conduct fine tuning on this model with sentences containing new vocabulary.
My training command is the following:

th /home/torch/OpenNMT/train.lua -data ${path_}/preprocessed-datasets/preprocessed-train.t7 \
-train_from ${path_}/models/_epoch7_19.78.t7 \
-continue \
-rnn_size 512 \
-encoder_type rnn \
-rnn_type LSTM \
-end_epoch 47 \
-max_batch_size 20 \
-save_model ${path_}/models/ \
-layers 1 \
-dropout 0.2 \
-update_vocab merge \
-optim adam \
-learning_rate 0.0002 \
-learning_rate_decay 1.0 \
-src_word_vec_size 512  \
-tgt_word_vec_size 512  \
-gpuid 1

I get this error:

[06/05/18 14:36:03 INFO] Using GPU(s): 1
[06/05/18 14:36:03 WARNING] The caching CUDA memory allocator is enabled. This allocator improves performance at the cost of a higher GPU memory usage. To optimize for memory, consider disabling it by setting the environment variable: THC_CACHING_ALLOCATOR=0
[06/05/18 14:36:03 INFO] Training Sequence to Sequence with Attention model…
[06/05/18 14:36:03 INFO] Loading data from ‘/home/German/infreq-exp/it-domain/fine-tuning/bleu100/preprocessed-datasets/preprocessed-train.t7’…
[06/05/18 14:36:03 INFO] * vocabulary size: source = 9412; target = 10017
[06/05/18 14:36:03 INFO] * additional features: source = 0; target = 0
[06/05/18 14:36:03 INFO] * maximum sequence length: source = 50; target = 51
[06/05/18 14:36:03 INFO] * number of training sentences: 11352
[06/05/18 14:36:03 INFO] * number of batches: 591
[06/05/18 14:36:03 INFO] - source sequence lengths: equal
[06/05/18 14:36:03 INFO] - maximum size: 20
[06/05/18 14:36:03 INFO] - average size: 19.21
[06/05/18 14:36:03 INFO] - capacity: 100.00%
[06/05/18 14:36:03 INFO] Loading checkpoint ‘/home/German/infreq-exp/it-domain/fine-tuning/bleu100/models/_epoch7_19.78.t7’…
[06/05/18 14:36:05 WARNING] Cannot change dynamically option -tgt_word_vec_size. Ignoring.
[06/05/18 14:36:05 WARNING] Cannot change dynamically option -src_word_vec_size. Ignoring.
[06/05/18 14:36:05 INFO] Resuming training from epoch 8 at iteration 1…
[06/05/18 14:36:05 INFO] * new source dictionary size: 9412
[06/05/18 14:36:05 INFO] * new target dictionary size: 10017
[06/05/18 14:36:05 INFO] * old source dictionary size: 26403
[06/05/18 14:36:05 INFO] * old target dictionary size: 26989
[06/05/18 14:36:05 INFO] * Merging new / old dictionaries…
[06/05/18 14:36:05 INFO] Updating the state by the vocabularies of the new train-set…
[06/05/18 14:36:06 INFO] * Updated source dictionary size: 26826
[06/05/18 14:36:06 INFO] * Updated target dictionary size: 27366
[06/05/18 14:36:08 INFO] Preparing memory optimization…
[06/05/18 14:36:08 INFO] * sharing 66% of output/gradInput tensors memory between clones
[06/05/18 14:36:08 INFO] Restoring random number generator states…
[06/05/18 14:36:08 INFO] Start training from epoch 8 to 47…
[06/05/18 14:36:08 INFO]
/home/torch/install/bin/luajit: ./onmt/train/Optim.lua:277: bad argument #2 to ‘add’ (sizes do not match at /home/torch/extra/cutorch/lib/THC/generated/…/generic/THCTensorMathPointwise.cu:217)
stack traceback:
[C]: in function ‘add’
./onmt/train/Optim.lua:277: in function ‘adamStep’
./onmt/train/Optim.lua:147: in function ‘prepareGrad’
./onmt/train/Trainer.lua:272: in function ‘trainEpoch’
./onmt/train/Trainer.lua:439: in function ‘train’
/home/torch/OpenNMT/train.lua:337: in function ‘main’
/home/torch/OpenNMT/train.lua:342: in main chunk
[C]: in function ‘dofile’
/home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

Some idea about what I am doing wrong?


(Guillaume Klein) #2

You probably don’t want to use -continue in this case. If you set it, the previous optimization states will be restored but your new model doesn’t have the same number of parameters.


(Zuzanna Parcheta) #3

Now it works. Thanks a lot.