Transformer-XL With OpenNMT

BaruchG · November 18, 2019, 3:48pm

Is there a way to use the Transformer-XL architecture for translation using OpenNMT? It seems like it would be a substantial improvement over the transformer model due to its ability to model longer term dependencies but I didn’t see anyone implement it for usage with machine translation directly.

Bachstelze · November 19, 2019, 9:44pm

Usually NMT-Systems only translate single sentences. In such cases the sequence limitation doesn’t disturb the encoding. For pretraining the seq2seq architectures like MASS are better suited.
Anways there are XL-implementations for pytorch and tensorflow.