OpenNMT Forum

Dynamic library not found with TensorFlow 2.1 & Cuda 10.1

Have upgraded to TensorFlow 2.1 and Cuda 10.1 and OpenNMT-tf 2.5.0. Training aborts after a succession of the following message:
2020-01-17 12:31:35.662290: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
Immediately after the initial train command I get the following:
“Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/lib64”
Before I start researching this I’d like to ask if anyone has had this issue with TensorFlow 2.1 and how they solved it. Thanks.

This message is warning and can be ignored as this library is optional. See TensorFlow documentation:

https://www.tensorflow.org/install/gpu

Maybe another process is using the GPU memory?

No, nothing else is using the GPU memory. However the last lines before the Abort seem to indicate an issue.
2020-01-17 16:32:25.995603: I tensorflow/stream_executor/stream.cc:4938] [stream=0x6149fe0,impl=0x6148c50] did not memcpy host-to-device; source: 0x7f9af801ba40
self.captured_inputs)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/tensorflow_core/python/eager/function.py”, line 1692, in _call_flat
2020-01-17 16:32:25.995651: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed
Aborted

Out of memory could be the underlying issue. Also make sure to have a driver version >= 418.39.

I have driver 418.87.01. Also I am training in the other direction a Transformer model with exactly the same settings and same volume of data I used last week before I upgraded. Using watch nvidia-smi I see most of the memory of the GTX 1080 Ti is engaged up to the Abort moment. .I see I need to investigate further.
The last messages before Abort are:
2020-01-17 18:53:16.016724: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-01-17 18:53:16.016772: I tensorflow/stream_executor/stream.cc:1990] [stream=0x5ca54a0,impl=0x5ca4d90] did not wait for [stream=0x5ca5260,impl=0x5ca4130]
2020-01-17 18:53:16.016789: I tensorflow/stream_executor/stream.cc:4938] [stream=0x5ca54a0,impl=0x5ca4d90] did not memcpy host-to-device; source: 0x7fa34b00a200
2020-01-17 18:53:16.016806: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(999, 512), b.shape=(512, 512), m=999, n=512, k=512
[[{{node transformer/self_attention_encoder/self_attention_encoder_layer/transformer_layer_wrapper/multi_head_attention/dense/MatMul}}]]
[[Func/gradients/global_norm/write_summary/summary_cond/then/_35/input/_100/_180]]
2020-01-17 18:53:16.016810: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed
Aborted
(tfx_env) miguel@joshua:~$

In Github & Nvidia devtalk discussions this message “create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED” seems to be associated with the fact that the Cuda 10.1 installation puts libcublas in /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.10.1. Is it the case that TensorFlow looks for the Cublas libraries in /usr/local/cuda-10.1/lib64/libcublas.so.10.1? I have tried a symlink to that as suggested in relevant forums but that has not worked. The GPU memory seems to become exhausted by these repeated attempts to initialize Cublas. Did the OpenNMT-tf developers encounter this problem? If so, how was it solved?

I suggest trying to run the training inside a CUDA Docker image (e.g. nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04). If it works this means the CUDA installation on the system is somehow incorrect.

This issue was resolved by completely removing the Cuda 10.1 toolkit and installing from scratch with cuda_10.1.243_418.87.00_linux.run. This installation puts the Blas libraries in a place where TensorFlow 2.* can find them.

2 Likes