How to use cross-lingual embeddings in OpenNMT-py?

Victor · August 27, 2019, 4:58pm

I have obtained cross-lingual embedding with MUSE toolkit, and want to use it as pretrained embeddings in OpenNMT, how can I do this ? Is it the same operations as using monolingual pretrained embeddings as mentioned in the tutorial of OpenNMT? That is, first preprocessing data with preprocess.py, got data.vocab.pt, then ./tools/embeddings_to_torch.py -emb_file_both path/to/cross-lingual embedding -dict_file data/data.vocab.pt -output_file data/embeddings, and got the pretrained encoder & decoder embeddings once again, finally training the model with the new encoder & decoder embedding?

Is my understanding correct? Could anybody help me solve the problem please? Thanks very much!!

guillaumekln · September 10, 2019, 10:24am

This looks like the correct way to do it. What specific problem did you encounter?