Thank you guys for the awesome job on openNMT and seq2seq-Attention implementation!
May I ask for the initialization strategy of the word tokens :_pad, _unknown, _go, _eos.
It’s not trained through word embedding an I wonder what initialization works the best for you?
Thank you for your time!
The padding embedding can be set to the zero vector. For the others, we did not experiment various initializations. The default initialization of the LookupTable is a normal distribution for embeddings, so you could just do the same when you prepare your embeddings.