Using sentencepiece (EN-DE pretrained OpenNMT-py model)

vvye · March 5, 2019, 9:01pm

Hi there,

I’m trying to use the pretrained English->German model (the very first one on http://opennmt.net/Models-py/). I’m getting the exact same results described in this question: the translation is entirely unrelated to the input.

The answer in that case was to use sentencepiece to tokenize the text, and the model I downloaded did come with a sentencepiece.model file. My problem, though, is that I don’t know what to do with it. How do you actually run sentencepiece? Simply having the file in the same directory as the en-de model doesn’t seem to be enough, and if there’s a command you need to use, I haven’t found it yet. None of the links I’ve seen so far have helped, so I guess I need more detailed instructions than you’d expect.

Thanks for helping a beginner figure this thing out!

vvye · March 6, 2019, 10:57am

I seem to have got it working. What I did was use the sentencepiece Python wrapper (https://github.com/google/sentencepiece/blob/master/python/README.md), and instead of the provided test_model.model, simply use openNMT’s sentencepiece.model. It’s really pretty obvious in hindsight.