OpenNMT Forum

How are the word embeddings learned during training?

(Yuan-Lu Chen) #1


How are the word embeddings learned during training?
The documentation says “Word embeddings are learned using a lookup table. Each word is assigned to a random vector within this table that is simply updated with the gradients coming from the network.” (

Can anyone give me more information on how OpenNMT learns the word embeddings automatically? Any research paper, citations that I can read?

(jean.senellart) #2

Hello, the word embeddings are indeed coming from a lookup table and are the end of the backpropagation chain. For each word, the parameters in the lookup table are updated with the gradient going back to this word.

The simplest model that is learning word embedding this way is a single layer LSTM language model as described for instance in Mikholov 2013.

OpenNMT is doing the same both in the encoder for the source language and the decoder for the target language. HTH

(Yuan-Lu Chen) #3

Thanks! Jean! You just summarize the whole paper in just a few sentences! This definitely helps!

(Wen Tsai) #4


I’m not sure what you mean exactly. I knew that embedding models in RNN could be something like NNLM or RNNLM, etc.

However, the paper you mentioned proposed word2vec, so does OpenNMT here using the word2vec model to pretrain word embeddings?

I’m a little confused now.

Will pretrained embeddings also be updated during training?