I have a model trained on a dataset for a number of epochs (let’s call it model_1). After stopping training, I retrain again using subset of the data (let’s call that one model_2), and after preprocessing I use -continue and I provide -src_vocab, -tgt_vocab, and -features_vocabs_prefix from model_1 preprocessing output. I assume that I retain the same architecture, vocabulary and options by doing so. when I use “average_models.lua” script to average different model_1 and model_2, I got this error:
/home/centos/torch/install/bin/lua: tools/average_models.lua:103: bad argument #2 to ‘add’ (sizes do not match at /home/centos/torch/extra/cutorch/lib/THC/generated/…/generic/THCTensorMathPointwise.cu:216)
[C]: in function 'add’
tools/average_models.lua:103: in function 'main’
tools/average_models.lua:116: in main chunk
[C]: in function ‘dofile’
…ntos/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
I assumed that the model architecture and parameters are retained and then could be easily averaged but it doesn’t seem to be the case.
Any explanation or help would be greatly appreciated!