Thanks for the fast replies!
I'm trying to extend seq2seq to other applications, not on NMT. IMO, the main benefit of multiple encoders is to deal with different forms of sources. For example, in IoT case, one encoder could encode temperature information, and another can encode sounds waves and so on. It happens that I have a sequential output of severity codes. Tossing these two forms of data of different sources and value scales in a single encoder does not seem ideal as compared to concatenating the encoded hidden state. Unfortunately I am not good on Lua to contribute technically.
By the way, I'm very interested in the word features capability http://opennmt.net/OpenNMT/data/word_features/.
Appreciate if you can help me confirm my understanding:
1) The word features embedding are optimized the same way as a normal embedding for a word in NMT through gradient updates. If the word embedding is 100 dim with a feature embedding of 50 dim, this means it is updated as if it is 150 dim "word embedding" just that the update for the 50 dim goes to the feature embedding.
2) If I turn fixed embedding on (e.g. fix_word_vecs_enc), does it still update both the word and feature embedding?