A w2v-coupled training procedure

I would enjoy to have a built-in word2vec network, that would calculate input/output embeddings during Nc epochs, while the main network is building a first draft coupled-model over it. Then, after Nc epochs, ONMT would turn to its current functioning, with embeddings adapted with SGD.

It would bring a great simplification of the heavy work needed to properly obtain this: