OpenNMT Forum

TensorFlow REST API + SentencePiece


The sample en_de python client in GitHub works fine, but it has too many dependencies to use as a basis for developing a client for calls from remote machines. What I’d like to do is to implement a rest client with the sentencepiece tf module. The sentencepiece repo has a link to sample code which doesn’t work for me, so I would appreciate any pointers and hints.


Thanks for the question. For the client perspective, what is the difference in using the SentencePiece TF module vs a Python module? Do you mean you want to include the SentencePiece module in the graph?

I don’t know and haven’t worked with the internals of TF yet, so bare with me :slight_smile: From my understanding, according to sentencepiece’s documentation, using the TF module allows the user to send raw sentences for inference and tokenization could then be performed “server-side” (?). In other words, is there a way to use a simple call with no tokens, just the raw sentence, and then have the tokenization performed on the server, or this requires additional custom layers? I would like to work on the current Trados plugin and add an option to use OpenNMT-tf through tf serving instead of OpenNMT-lua’s REST server.

This is true if the model is exported using this module (so OpenNMT-tf code has to be modified) and TensorFlow Serving binaries are compiled against the SentencePiece operator.

I know the serving story is still incomplete for OpenNMT-tf (i.e. requires some work) and I have in mind to make the OpenNMT tokenizer a custom TensorFlow op that can be used in serving.

For now, can you check if using the nmt-wizard-docker approach works for you?

All clear now! So, although I could now use the nmt-wizard-docker, I should probably wait for you to add this feature in the standalone OpenNMT-tf project before I can work on the plugin.
Thanks for your help and all your great work!

1 Like

I’ll be watching this space :-). I’m a bit behind you but am looking at the same issues!

Hi, I followed the instructions in to the letter and everything gets set up nicely. However when I send off "curl -X POST http://localhost:5000/translate -d ‘{“src”:[{“text”: “Hello world!”}]}’ I get the response from the model server shown below, whatever the source input. Any ideas what’s happening, please?

Is it with the same pretrained model?

Yes, it is with the pretrained model. I followed the instructions exactly.

Going through the server logs I suspect this is a problem to do with Docker somehow not finding the GPU. I see the message “failed call to cuInit” and then “no NVIDIA GPU device is present:/dev/nvidia0 does not exist”. But of course it does exist and TensorFlow knows about it as I’ve just trained some five TF models with it. I’m guessing here that the gRPC ModelServer is working from the CPU to deliver that phantom output?

I now have the nmt-wizard-docker approach working fine with a TensorFlow model trained with my own data(Tagalog/English). In fact, once the SentencePiece model is built its workings (tokenization/detokenization, with/without BPE) are quite seamless. So far I’ve been running my tests with curl and will now adapt one of my clients to handle TF requests.


So I modified the plugin and added an option to also work with nmt-wizard-docker. I’d like to ask if the alignments are returned by default from compatible models, and which ones, or if one has to run docker with a relevant option --this might have been answered already but can’t find it in the forums.

I made the rest client’s code a bit more modular so it is easier now to add more options to connect to different APIs for OpenNMT-tf (once ready) and OpenNMT-Py. I will clean the code a bit more and upload it to SDL’s App Store, but here’s how the plugin looks like now:

They are returned on a best-effort basis. Currently, they are only returned with the OpenNMT-lua backend but I should at least add them for OpenNMT-tf. Note that using alignments with a Transformer model will require to use guided alignments during the training for them to be usable.

Nice! Note that nmt-wizard-docker is a useful wrapper around both OpenNMT-tf and OpenNMT-py so that you can use a single REST interface for translation. And actually, you can also translate with Google, DeepL, Baidu, and others providers with the same API call.

You are right, and actually I thought a bit about this, if I should remove completely the Lua server interface and use only nmt-wizard-docker’s that covers all frameworks, but I thought it would be safer to keep compatibility with Lua’s server for a while and maybe add options for each framework’s separate REST interface. We’ll see… :slightly_smiling_face:


I’m still keeping hold of my Lua stuff for the time being :slight_smile:

The TensorFlow team recently released tensorflow/text as a way to facilitate text processing in TensorFlow graphs. It currently include the WordPiece tokenizer (presumably used by Google Translate) and I read that SentencePiece support is planned.

This could remove the need of having a proxy between the client and the model for standard use cases (i.e. non complex preprocessing). I will look into that.

1 Like

Hi @guillaumekln,

This looks very interesting indeed. I guess this is planned for TensorFlow 2, right?

I did not find an exact timeline. Hopefully, it will be nicely integrated by the time TF 2.0 is released.

We are slowly getting there. TensorFlow Text added a SentencePiece tokenizer which should be available in TensorFlow Serving in the near future. Once released, I plan to work on a tutorial that goes from zero to a ready-to-use TensorFlow Serving instance in a few steps (finally…).