How are the word embeddings learned during training?
The documentation says “Word embeddings are learned using a lookup table. Each word is assigned to a random vector within this table that is simply updated with the gradients coming from the network.” (http://opennmt.net/OpenNMT/training/embeddings/)
Can anyone give me more information on how OpenNMT learns the word embeddings automatically? Any research paper, citations that I can read?
Hello, the word embeddings are indeed coming from a lookup table and are the end of the backpropagation chain. For each word, the parameters in the lookup table are updated with the gradient going back to this word.
The simplest model that is learning word embedding this way is a single layer LSTM language model as described for instance in Mikholov 2013.
OpenNMT is doing the same both in the encoder for the source language and the decoder for the target language. HTH