Will pretrained embeddings also be updated during training?


(Wen Tsai) #1

According to the documentations, if no pretrained embeddings are provided, the embeddings would be assigned to a random vector and then be updated with the gradients coming from the network.

But if certain pretrained embeddings are specified, will they be updated as well? If they’re going to be updated anyway, does it necessary to provide pretrained embeddings?

Also, is there any reference or literature about the implementation used to update the embeddings that I can read?

(Eva) #2

Hi @WenTsai !

you can train a translation model using pretrained embeddings and fixing them, this is, they will not be updated during training. You can do this setting the -fix_word_vecs_enc and -fix_word_vecs_dec input parameters to true. This is helpful if you want to maintain the semantics from your pretrained embeddings without mingle them with the training data.

Indeed, if you do not specify any pretrained emeddings, the system will learn the words representations from your training data during the whole training process.

However, sometimes it is interesting to start with a pretrained embeddings that had already other semantics (for instance, word embeddings from a general domain ) that will be specialized towards the information of the mt training data.
Using pretrained embeddings can be expected to help the system to achieve better translations, but if you have enough data, the gain of pretrained embeddings will be less clear after some epochs.

The embedding update, as far as I know, is made in the same way as in the other opennmt system modules, by backpropagating the gradient errors through the lookup table that implements them.
You can find more related info in this other forum posts:

(Wen Tsai) #3

Hi @emartinezVic,

Thanks for your detailed explanation that help me clarify the relationship between pretrained embeddings and training processes.

Then I can train the models with fixed and updating pretraind embeddings respectively to test which one would be better.

Thanks a lot!