Running OpenNMT-tf model using tensorflow serving

Anton · October 24, 2019, 7:56pm

Hi everyone,
as a part of experiment, I want to compare OpenNMT-py and OpenNMT-tf models trained on same data with the same hyperparameters. -py model is already trained and is run on server using Apache + Flask in docker container. So you send a request using curl for examle with text to translate and you receive translation. I want to achieve the same behavior for -tf model using tensorflow serving. -tf model has been already trained too, I have averaged several last checkpoints and exported it to saved model. So I have following questions:

Model for tensorflow serving has the following structure: frozen graph, directory variables, directory assets, which contains dictionaries. I have to put in this directory vocabularies, that i used for training, am I right?
For running -py model on server, I use pyonmttok as tokenizer for which I provide trained BPE model, so tokenization part of my json config looks like
“tokenizer”: {
“type”: “pyonmttok”,
```
         "params":{
             "bpe_model_path":"tokenizer/pyonmttok_files/merge.bpe",
             "joiner":"@@"},
         "mode":"aggressive"
```

how should config for -tf model look like and where I should put it? Unfortunately, very liitle information is provided

There two examples in OpenNMT-tf github for tensorflow serving, which of them should I follow, to achieve behavior described above?

Sorry, if my questions might be rather silly, but I have never used tensorflow serving and I don’t have too much spare time to dig deep into it.

guillaumekln · October 25, 2019, 7:27am

Hi,

Vocabularies are automatically embedded in the exported SavedModel. No manual action should be required.
The issue is that TensorFlow Serving only runs TensorFlow ops which the OpenNMT Tokenizer is not. If you want to use TensorFlow Serving specifically, you need to find a way to tokenize the data before sending the translation request. See http://opennmt.net/OpenNMT-tf/serving.html#input-preprocessing-and-tokenization for more information.
It depends on what you want to achieve. If you want to support concurrent users with the ability to batch multiple incoming requests then you should use TensorFlow Serving. Otherwise it’s probably easier to write a simple Python HTTP server inspired by the “python” serving example.

Why do you want to try OpenNMT-tf if you already have a OpenNMT-py server running? More generally there will be more work involved at this time.