I am going to build Catalan to Catalan translation system using OpenNMT. I wonder if there is away to use embeddings from XLM-R. Please support me on this issue
Hey Johnas Solomon,
as far as I know, you can only use word embeddings with openNMT and not a pretrained sequence encoder. But XLM was initially used as pretrained encoder for low resource languages. Unsupervised Cross-lingual Representation Learning at Scale has quite the same transformer architecture which should you allow to use the training script file from xlm with adapted settings:
Depending on the version: Base(L= 12, H = 768, A = 12, 270M params) and Large(L = 24, H = 1024, A = 16, 550M params)
I am going to build Catalan to Catalan translation system
What is your aim? It sounds like an automatic grammar correction system.
Greetings from the translation space
Thank you for your reply, @Bachstelze, My aim is to make a translation between Catalan to Catalan Sing Language. The grammar of the two languages is different (e.g Input-> He sells food. Output (sign language sentence)-> Food he sells).
Could you please elaborate your answer? I’m very new to the filed of deep learning.
My answer is that you can use different versions of XML for translations with the pytorch or fairseq framework. Moreover, there are pretrained multilingual seq2seq models. For openNMT you can use word embeddings like fasttext which has pretrained embeddings for Catalan. With word embeddings you are going to have a fixed vocabulary, but I have seen a translation implementation which uses the function of fasttext to construct embeddings with the sum of character ngram vectors. The current openNMT implementation should be good, if you are only interested in the translation of standard sentences with no out-of-vocabulary words.
Thank you @Bachstelze. It makes sense. Thank you for your helpful answer. I may seek your advice in the future.