Hi I trained an lstm model with shared vocabulary and used the -share_decoder_embeddings and -share_embeddings options.
The emb_luts embedding layer has 50004 words.
I saved the trained model, then used the saved checkpoint to train for another bit (-train_from) , this time setting the
-fix_word_vecs_enc and -fix_word_vecs_dec flags.
I expected the “fine-tuned” model to have the same embedding matrices as the ones from the base model. However, that was not the case when I actually compared them.
encoder_matrix_og = model_lstm[‘model’][‘encoder.embeddings.make_embedding.emb_luts.0.weight’].cpu().numpy()
encoder_matrix_ft = finetuned_lstm[‘model’][‘encoder.embeddings.make_embedding.emb_luts.0.weight’].cpu().numpy()
print(encoder_matrix_og)
array([[ 0.15590172, -0.09746376, 0.20437966, …, 0.17598929,
0.08156995, 0.31481993],
[ 0.06886071, 0.09739305, 0.06408393, …, 0.02725093,
0.06250075, 0.03845875],
[-0.08549578, -0.02192731, 0.32102218, …, -0.14098114,
0.26072246, -0.11170442],
…,
[ 0.05883738, -0.06367233, 0.08204132, …, 0.36408105,
-0.06660978, -0.095727 ],
[-0.00657062, 0.01990662, -0.03282089, …, 0.04410684,
0.06983539, -0.05920906],
[ 0.0405694 , 0.11745881, 0.13548265, …, -0.09362546,
0.07424163, -0.03483336]], dtype=float32)print(encoder_matrix_ft)
array([[ 0.02719062, -0.27679086, 0.175478 , …, 0.08351177,
0.15322194, 0.12799822],
[ 0.06887297, 0.09740731, 0.06408892, …, 0.02722452,
0.06248574, 0.03845735],
[-0.08500054, 0.02577828, 0.3388119 , …, -0.12022047,
0.25378993, -0.10860381],
…,
[ 0.06116741, -0.060183 , 0.0839922 , …, 0.36391202,
-0.06733748, -0.09531425],
[-0.00708292, 0.01796832, -0.03329238, …, 0.04410985,
0.07168514, -0.05981572],
[ 0.03849256, 0.11605936, 0.14136563, …, -0.09195911,
0.07555307, -0.03628784]], dtype=float32)
So in fact the embedding matrices got updated during the second round of training even though I set both -fix_word_vecs_enc and -fix_word_vecs_dec .
This was done using onmt version ‘0.9.1’ and pytorch version 1.1.0.