-fix_word_vecs_enc and -fix_word_vecs_dec not doing what I expect them to do

Hi I trained an lstm model with shared vocabulary and used the -share_decoder_embeddings and -share_embeddings options.

The emb_luts embedding layer has 50004 words.

I saved the trained model, then used the saved checkpoint to train for another bit (-train_from) , this time setting the
-fix_word_vecs_enc and -fix_word_vecs_dec flags.

I expected the “fine-tuned” model to have the same embedding matrices as the ones from the base model. However, that was not the case when I actually compared them.

encoder_matrix_og = model_lstm[‘model’][‘encoder.embeddings.make_embedding.emb_luts.0.weight’].cpu().numpy()
encoder_matrix_ft = finetuned_lstm[‘model’][‘encoder.embeddings.make_embedding.emb_luts.0.weight’].cpu().numpy()

print(encoder_matrix_og)
array([[ 0.15590172, -0.09746376, 0.20437966, …, 0.17598929,
0.08156995, 0.31481993],
[ 0.06886071, 0.09739305, 0.06408393, …, 0.02725093,
0.06250075, 0.03845875],
[-0.08549578, -0.02192731, 0.32102218, …, -0.14098114,
0.26072246, -0.11170442],
…,
[ 0.05883738, -0.06367233, 0.08204132, …, 0.36408105,
-0.06660978, -0.095727 ],
[-0.00657062, 0.01990662, -0.03282089, …, 0.04410684,
0.06983539, -0.05920906],
[ 0.0405694 , 0.11745881, 0.13548265, …, -0.09362546,
0.07424163, -0.03483336]], dtype=float32)

print(encoder_matrix_ft)
array([[ 0.02719062, -0.27679086, 0.175478 , …, 0.08351177,
0.15322194, 0.12799822],
[ 0.06887297, 0.09740731, 0.06408892, …, 0.02722452,
0.06248574, 0.03845735],
[-0.08500054, 0.02577828, 0.3388119 , …, -0.12022047,
0.25378993, -0.10860381],
…,
[ 0.06116741, -0.060183 , 0.0839922 , …, 0.36391202,
-0.06733748, -0.09531425],
[-0.00708292, 0.01796832, -0.03329238, …, 0.04410985,
0.07168514, -0.05981572],
[ 0.03849256, 0.11605936, 0.14136563, …, -0.09195911,
0.07555307, -0.03628784]], dtype=float32)

So in fact the embedding matrices got updated during the second round of training even though I set both -fix_word_vecs_enc and -fix_word_vecs_dec .

This was done using onmt version ‘0.9.1’ and pytorch version 1.1.0.

Hi,
This is due to the way the -train_from option works. When using this option, it reuses all the model_opts from the loaded checkpoint, only the train_opts flags will be taken into account. (You can see which are which here.)
We should add a way to handle this, but there is no consensus on the best way to handle such cases right now.
An easy way to achieve what you want is to ‘manually’ edit the checkpoint by opening a python shell and do something like the following:

import torch
ckpt = torch.load("your_checkpoint.pt")
ckpt['opt'].fix_word_vecs_enc = True
ckpt['opt'].fix_word_vecs_dec = True
torch.save(ckpt, "new_checkpoint.pt")

and then train from the newly saved checkpoint.

Hope it helps!

When I tried the fix I got this error

File “/home/yiu/OpenNMT-py-master/train.py”, line 200, in
main(opt)
File “/home/yiu/OpenNMT-py-master/train.py”, line 86, in main
single_main(opt, 0)
File “/home/yiu/OpenNMT-py-master/onmt/train_single.py”, line 92, in main
optim = Optimizer.from_opt(model, opt, checkpoint=checkpoint)
File “/home/yiu/OpenNMT-py-master/onmt/utils/optimizers.py”, line 277, in from_opt
optimizer.load_state_dict(optim_state_dict)
File “/home/yiu/OpenNMT-py-master/onmt/utils/optimizers.py”, line 305, in load_state_dict
self._optimizer.load_state_dict(state_dict[‘optimizer’])
File “/home/yiu/.conda/envs/ttext/lib/python3.7/site-packages/torch/optim/optimizer.py”, line 115, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group

I think you may need to pass -reset_optim all as well, because you don’t want to optimize the embeddings anymore, hence the size mismatch.

When I set -reset_optim all I was able to retrain but it seems to reset the optimizer and I don’t know whatever other parts of the model as well. I used the -reset_optim states option and it seems to do just what I need to do.

The documentation is quite skimpy on this, but what is the difference between the difference between all the options of -reset_optim ?

Yes, my bad, -reset_optim all might be a bit too much for your use case.
This is documented a bit in the original PR and in the code itself.

So it looks like -rest_optim states should reset the learning rate but load the parameters. However, when I used this flag I see the learning rate being kept the same as the checkpoint instead of reset to 1.
When they mention keeping the “optimizer”, and reseting the “options”, what variables are they referring to specifically?

Options in that case are model_opts and train_opts. (The ones parsed here and saved into the checkpoint.)
If you want to easily reset the learning schedule your best bet is to -reset_optim all and pass the training options you’d like (i.e. the same as before, but updating the ones you want to change). Or you could do the same as you did for fix_word_vecs_enc/dec by resetting those parameters inside the checkpoint before loading it.