How to use serving on OpenNMT-tf V2

ashleycheng · December 27, 2019, 7:04am

Hello,

I built a model using OpenNMT-tf V2 and tried to run an exported model with Tensorflow Serving. I used the command which could successfully serve model with previous version of OpenNMT-tf by CPU before, but it failed now.

tensorflow_model_server --rest_api_port=9004 --model_name=my_model --model_base_path=/run/best

According to this issue, I changed to use opennmt/tensorflow-serving:2.0.0-gpu instead of offical TensorFlow Serving.

Here is the command:

docker run -t --rm -p 9004:9004 -v $PWD:/models \
--name tensorflow_serving --entrypoint tensorflow_model_server \
opennmt/tensorflow-serving:2.0.0-gpu \
--enable_batching=true --batching_parameters_file=/models/batching_config.txt \
--port=9004 --model_base_path=/models/run/best --model_name=my_model

And here is the log:

2019-12-26 08:36:33.602414: I tensorflow_serving/model_servers/server.cc:85] Building single TensorFlow model file config:  model_name: my_model model_base_path: /models/run/best
2019-12-26 08:36:33.603534: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2019-12-26 08:36:33.603582: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: my_model
2019-12-26 08:36:33.704264: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: my_model version: 500}
2019-12-26 08:36:33.704346: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: my_model version: 500}
2019-12-26 08:36:33.704391: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: my_model version: 500}
2019-12-26 08:36:33.704435: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/run/best/500
2019-12-26 08:36:33.764514: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-12-26 08:36:33.844227: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-26 08:36:33.845301: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-26 08:36:33.845323: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2019-12-26 08:36:33.845354: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2019-12-26 08:36:33.973097: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2019-12-26 08:36:34.947318: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /models/run/best/500
2019-12-26 08:36:35.381295: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 1676844 microseconds.
2019-12-26 08:36:35.395127: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:169] Wrapping session to perform batch processing
2019-12-26 08:36:35.395179: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:153] Wrapping session to perform batch processing
2019-12-26 08:36:35.396086: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/run/best/500/assets.extra/tf_serving_warmup_requests
2019-12-26 08:36:35.402262: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: my_model version: 500}
2019-12-26 08:36:35.410654: I tensorflow_serving/model_servers/server.cc:353] Running gRPC ModelServer at 0.0.0.0:9004 ...

Then I made a request:

curl -d '{"inputs": {"tokens":[["shān", "lù", "zhòng", "bìng", "yǒu", "yí", "duàn", "yá"]], "length":[8]}}' \
-X POST http://0.0.0.0:9004/v1/models/my_model:predict

I only got the following message:

Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.

I added --output - to the command, but the output was nothing.

Did I miss something? What should I do to successfully serve my model with OpenNMT-tf V2?

guillaumekln · December 27, 2019, 9:02am

Hi,

You should set --rest_api_port instead of --port on the serving command line.

ashleycheng · December 30, 2019, 3:02am

It works! Thank you vey much!