Hello, I’m trying to generate a vocabulary mapper file to improve the performance of M2M100 but I can’t apply the method described here: papers/WNMT2018/vmap at master · OpenNMT/papers (github.com).
In the M2M100 model there is a vocabulary file, but it is not in the form of a phrase table but only a text file. So I have to convert it and the documentation says to do it:
docker run --rm -v MYCORPUSPATH:/root/corpus build-pt CORPUSNAME SS TT N > phrase-table.gz
with as argument: CORPUSPATH/CORPUSNAME.{SS,TT}
Is it only necessary to put the path to the vocabulary file? I haven’t got the script working yet, so if you can help me with this
if I understood correctly, the text must be in another format.
Note: I need vocabulary in all languages, not just one