"cuDNN launch failure" Error when I use tensorflow_serving

hello guys,
I want to start using tensorflow serving gpu to translate, so I follow the steps in Inference with TensorFlow Serving and also I used pretrained model (averaged-ende-export500k-v2)
when I run

ende_client.py

after enter a sentence for checking the translation I get the below error:

Traceback (most recent call last):
File “ende_client.py”, line 116, in
main()
File “ende_client.py”, line 109, in main
stub, args.model_name, [text], tokenizer, timeout=args.timeout
File “ende_client.py”, line 84, in translate
result = future.result()
File “/home/hadis/Downloads/serve/OpenNMT-tf-master/examples/serving/tensorflow_serving/trans_env/lib/python3.6/site-packages/grpc/_channel.py”, line 722, in result
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = “2 root error(s) found.
(0) Internal: cuDNN launch failure : input shape ([1,3,512,1])
[[{{node transformer_base_1/self_attention_encoder_1/self_attention_encoder_layer_6/transformer_layer_wrapper_30/layer_norm_33/FusedBatchNormV3}}]]
(1) Internal: cuDNN launch failure : input shape ([1,3,512,1])
[[{{node transformer_base_1/self_attention_encoder_1/self_attention_encoder_layer_6/transformer_layer_wrapper_30/layer_norm_33/FusedBatchNormV3}}]]
[[StatefulPartitionedCall_4/StatefulPartitionedCall/transformer_base_1/StatefulPartitionedCall_1/Minimum/_737]]
0 successful operations.
0 derived errors ignored.”
debug_error_string = “{“created”:”@1612101329.953582644",“description”:“Error received from peer ipv4:127.0.0.1:9000”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1067,“grpc_message”:“2 root error(s) found.\n (0) Internal: cuDNN launch failure : input shape ([1,3,512,1])\n\t [[{{node transformer_base_1/self_attention_encoder_1/self_attention_encoder_layer_6/transformer_layer_wrapper_30/layer_norm_33/FusedBatchNormV3}}]]\n (1) Internal: cuDNN launch failure : input shape ([1,3,512,1])\n\t [[{{node transformer_base_1/self_attention_encoder_1/self_attention_encoder_layer_6/transformer_layer_wrapper_30/layer_norm_33/FusedBatchNormV3}}]]\n\t [[StatefulPartitionedCall_4/StatefulPartitionedCall/transformer_base_1/StatefulPartitionedCall_1/Minimum/_737]]\n0 successful operations.\n0 derived errors ignored.”,“grpc_status”:13}"

I can’t figure out what is the problem?

thank you for your help.

Hi,

Did you use the same Docker image as in the example?

Hi,
yes. I do the all steps that it’s said in example.
Also, I checked that with nmtWizard https://github.com/OpenNMT/nmt-wizard-docker/issues/46#issuecomment-456795844, and it works correctly.
I don’t know if something that I missed!

I just re-run the example step by step and faced no errors.

Is it crashing on a specific input?
Can you post the logs of the Serving container?

No, It’s just simple Hello word

this is the log reults.

2021-02-01 13:23:37.445062: I tensorflow_serving/model_servers/server.cc:87] Building single TensorFlow model file config: model_name: ende model_base_path: /models/ende
2021-02-01 13:23:37.446896: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2021-02-01 13:23:37.446908: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: ende
2021-02-01 13:23:37.547936: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: ende version: 1}
2021-02-01 13:23:37.548018: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: ende version: 1}
2021-02-01 13:23:37.548060: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: ende version: 1}
2021-02-01 13:23:37.548150: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/ende/1
2021-02-01 13:23:37.613383: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-02-01 13:23:37.613411: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Reading SavedModel debug info (if present) from: /models/ende/1
2021-02-01 13:23:37.613942: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-01 13:23:37.616946: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-02-01 13:23:37.648973: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-01 13:23:37.649498: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2021-02-01 13:23:37.649510: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2021-02-01 13:23:37.649550: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-01 13:23:37.650108: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-01 13:23:37.650532: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-01 13:23:39.301136: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-01 13:23:39.301218: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-02-01 13:23:39.301228: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-02-01 13:23:39.301512: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-01 13:23:39.301981: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-01 13:23:39.302554: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-01 13:23:39.303002: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4989 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-02-01 13:23:39.404600: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:199] Restoring SavedModel bundle.
2021-02-01 13:23:40.412292: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/ende/1
2021-02-01 13:23:40.654102: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:303] SavedModel load for tags { serve }; Status: success: OK. Took 3105958 microseconds.
2021-02-01 13:23:40.664746: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:174] Wrapping session to perform batch processing
2021-02-01 13:23:40.664769: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:153] Wrapping session to perform batch processing
2021-02-01 13:23:40.664879: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /models/ende/1/assets.extra/tf_serving_warmup_requests
2021-02-01 13:23:40.673472: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: ende version: 1}
2021-02-01 13:23:40.679045: I tensorflow_serving/model_servers/server.cc:367] Running gRPC ModelServer at 0.0.0.0:9000 …
2021-02-01 13:23:56.412404: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-02-01 13:23:56.412565: W external/org_tensorflow/tensorflow/stream_executor/stream.h:2049] attempting to perform DNN operation using StreamExecutor without DNN support

Also, I installed requirement.txt, so my tensorflow is Version: 2.0.4

thank you for your help

Oops. I checked logs, It seems it doesn’t recognize my gpu.
It’s strange because I don’t get any error when I run docker, so I didn’t notice that.
Sorry.

for people who may have this problem in the future:
it seems the problem was tensorflow/serving:2.3.1 .
I used tensorflow/serving:2.4.1 and my problem was solved.

1 Like