High Rank RNN model as Encoder or Decoder

Is there a simple way to incorporate the currently best language model (due to perplexity evaluation) into OpenNMT?
I am wondering why nobody test those language models for machine translation. Doesn’t the perplexity play a role in the model architecture of NMT?
The implementation is available in Pytorch Mixture of Softmaxes ,so the more specific question would be: How could be the model used in the pytorch-branch of OpenNMT?

This could be a double request but the last hint got no answer: replacing softmax

There is a research with BPE and " Hybrid-LightRNN": Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation from Xiang Kong, Qizhe Xie, Zihang Dai, Eduard Hovy