How are the word embeddings learned during training?
The documentation says “Word embeddings are learned using a lookup table. Each word is assigned to a random vector within this table that is simply updated with the gradients coming from the network.” (http://opennmt.net/OpenNMT/training/embeddings/)
Can anyone give me more information on how OpenNMT learns the word embeddings automatically? Any research paper, citations that I can read?