Is there a way to use the Transformer-XL architecture for translation using OpenNMT? It seems like it would be a substantial improvement over the transformer model due to its ability to model longer term dependencies but I didn’t see anyone implement it for usage with machine translation directly.
1 Like
Usually NMT-Systems only translate single sentences. In such cases the sequence limitation doesn’t disturb the encoding. For pretraining the seq2seq architectures like MASS are better suited.
Anways there are XL-implementations for pytorch and tensorflow.