=> The result totally different with the result in the command line above (step 1).
I did the same process with ende500k model, and this return good result because it have wmtende.model (I thought).
Question:
Q1. I tried to generate “sentencepiece_model” of my model using “https://github.com/google/sentencepiece” (Because I can not find my “wmtende.model” in my exported model) using:
-> So I had a model file named “ente.model”, then I put it to config in main function above.
But it still returns the wrong result.
=> Where can I find sentencepiece_model of my model?
Q2. Is there any way to run my model in python code without using **sentencepiece_model **?
The example assumes that the data were prepared using SentencePiece. If you did not use SentencePiece, you should adapt the example code to apply your tokenization instead.
When I run SentencePiece from my home directory ‘~/’ I find my model there as ‘my_model.model’(or whatever), your one should be ‘ente.model’ as you specified ‘ente’ as your prefix. That’s what you need to refer to for inference and in your config.json file for serving, giving the full path to your model.
Yes, I didn’t use SentencePiece or any others. I just arrange each line one sentence and train it.
So can I skip using “sentencepiece_model”? Thankyou !
If you post an e-mail address I’d be happy to send you my config.json file. But the TensorFlow model server is expecting to receive tokenized text, whether you use SentencePiece or another mode of tokenization. Personally I’ve found it easiest to use SentencePiece but that means your training data and evaluation files all need to be processed with SentencePiece.
Return text file with id replace of the word: 1231 4353 3132 3333 in my data file (ex: src-train.txt, …)
But the data file of the example model in the quick start is normal sentences (https://s3.amazonaws.com/opennmt-trainingdata/toy-ende.tar.gz). I have no idea about How to tokenize with 2 formats of SentencePiece to receive the file like “toy-ende” data.
I have mailed you a screenshot of my config file. Are you running spm_decode on your output? The Docker server model takes care of all that. I feed in raw sentences, they are spm-encoded, inference is done and the output is spm-decoded back into raw sentences.
@tel34 is referring to the nmt-wizard-docker integration which provides a wrapper around TensorFlow Serving and a configurable tokenization. See some instructions here:
Hi Duy. I’m curious, what’s your interest in the Tetun language? I developed a translator for tetun a few years ago and am looking to upgrade to OpenNMT. Wondering if you’d be interested in collaborating?
To Come state to the point, and solve the problem if you dont want to use Sentencepiece just make this tweak to the code of your python client file and it will run.