Dynamic library not found with TensorFlow 2.1 & Cuda 10.1

Have upgraded to TensorFlow 2.1 and Cuda 10.1 and OpenNMT-tf 2.5.0. Training aborts after a succession of the following message:
2020-01-17 12:31:35.662290: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
Immediately after the initial train command I get the following:
“Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/lib64”
Before I start researching this I’d like to ask if anyone has had this issue with TensorFlow 2.1 and how they solved it. Thanks.

This message is a warning and can be ignored as this library is optional. See TensorFlow documentation about the software requirements:


Maybe another process is using the GPU memory?

No, nothing else is using the GPU memory. However the last lines before the Abort seem to indicate an issue.
2020-01-17 16:32:25.995603: I tensorflow/stream_executor/stream.cc:4938] [stream=0x6149fe0,impl=0x6148c50] did not memcpy host-to-device; source: 0x7f9af801ba40
File “/home/miguel/tf2_env/lib/python3.5/site-packages/tensorflow_core/python/eager/function.py”, line 1692, in _call_flat
2020-01-17 16:32:25.995651: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed

Out of memory could be the underlying issue. Also make sure to have a driver version >= 418.39.

I have driver 418.87.01. Also I am training in the other direction a Transformer model with exactly the same settings and same volume of data I used last week before I upgraded. Using watch nvidia-smi I see most of the memory of the GTX 1080 Ti is engaged up to the Abort moment. .I see I need to investigate further.
The last messages before Abort are:
2020-01-17 18:53:16.016724: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-01-17 18:53:16.016772: I tensorflow/stream_executor/stream.cc:1990] [stream=0x5ca54a0,impl=0x5ca4d90] did not wait for [stream=0x5ca5260,impl=0x5ca4130]
2020-01-17 18:53:16.016789: I tensorflow/stream_executor/stream.cc:4938] [stream=0x5ca54a0,impl=0x5ca4d90] did not memcpy host-to-device; source: 0x7fa34b00a200
2020-01-17 18:53:16.016806: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(999, 512), b.shape=(512, 512), m=999, n=512, k=512
[[{{node transformer/self_attention_encoder/self_attention_encoder_layer/transformer_layer_wrapper/multi_head_attention/dense/MatMul}}]]
2020-01-17 18:53:16.016810: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed
(tfx_env) miguel@joshua:~$

In Github & Nvidia devtalk discussions this message “create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED” seems to be associated with the fact that the Cuda 10.1 installation puts libcublas in /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.10.1. Is it the case that TensorFlow looks for the Cublas libraries in /usr/local/cuda-10.1/lib64/libcublas.so.10.1? I have tried a symlink to that as suggested in relevant forums but that has not worked. The GPU memory seems to become exhausted by these repeated attempts to initialize Cublas. Did the OpenNMT-tf developers encounter this problem? If so, how was it solved?

I suggest trying to run the training inside a CUDA Docker image (e.g. nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04). If it works this means the CUDA installation on the system is somehow incorrect.

This issue was resolved by completely removing the Cuda 10.1 toolkit and installing from scratch with cuda_10.1.243_418.87.00_linux.run. This installation puts the Blas libraries in a place where TensorFlow 2.* can find them.


Check for another installed versions in that machine. Possible multiple versions’ conflict causes this error. Uninstall all and reinstall only the version you need.