Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0

SamuelLacombe · November 12, 2021, 3:07am

Hello,

I’m having issue running opennmt in Colab. It used to be working, but for some reasons that I have no clue, it’s not anymore.

I have tried to update the version of my libraries (tenserflow and ctranslate2 to 2.7.0), but nothing avail.

here is the logs i’m getting from running !omnt-main:

2021-11-12 02:50:00.698416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:00.706615: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:00.707209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:00.832000: I main.py:304] Using OpenNMT-tf version 2.22.0
2021-11-12 02:50:00.832000: I main.py:304] Using model:
(model): TransformerBase(
  (examples_inputter): SequenceToSequenceInputter(
    (features_inputter): WordEmbedder()
    (labels_inputter): WordEmbedder()
    (inputters): ListWrapper(
      (0): WordEmbedder()
      (1): WordEmbedder()
    )
  )
  (encoder): SelfAttentionEncoder(
    (position_encoder): SinusoidalPositionEncoder(
      (reducer): SumReducer()
    )
    (layer_norm): LayerNorm()
    (layers): ListWrapper(
      (0): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (1): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (2): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (3): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (4): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (5): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
    )
  )
  (decoder): SelfAttentionDecoder(
    (position_encoder): SinusoidalPositionEncoder(
      (reducer): SumReducer()
    )
    (layer_norm): LayerNorm()
    (layers): ListWrapper(
      (0): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (1): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (2): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (3): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (4): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (5): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
    )
  )
)

2021-11-12 02:50:00.838204: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-12 02:50:00.838511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:00.839285: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:00.839894: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:01.524324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:01.525011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:01.525583: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-12 02:50:01.526109: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-11-12 02:50:01.526158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15405 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
2021-11-12 02:50:01.527000: I main.py:312] Searching the largest batch size between 256 and 16384 with a precision of 256...
2021-11-12 02:50:01.531000: I main.py:312] Trying training with batch size 8320...
2021-11-12 02:50:31.404000: I main.py:312] ... failed.
2021-11-12 02:50:31.410000: I main.py:312] Trying training with batch size 4287...
2021-11-12 02:51:01.902000: I main.py:312] ... failed.
2021-11-12 02:51:01.908000: I main.py:312] Trying training with batch size 2271...
2021-11-12 02:51:32.245000: I main.py:312] ... failed.
2021-11-12 02:51:32.251000: I main.py:312] Trying training with batch size 1263...
2021-11-12 02:52:02.458000: I main.py:312] ... failed.
2021-11-12 02:52:02.463000: I main.py:312] Trying training with batch size 759...
2021-11-12 02:52:32.769000: I main.py:312] ... failed.
2021-11-12 02:52:32.776000: I main.py:312] Trying training with batch size 507...
2021-11-12 02:53:03.043000: I main.py:312] ... failed.
2021-11-12 02:53:03.044000: E main.py:312] Last training attempt exited with an error:

"""
2021-11-12 02:53:01.849184: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-12 02:53:01.851904: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/opennmt/bin/main.py", line 350, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/opennmt/bin/main.py", line 312, in main
    hvd=hvd,
  File "/usr/local/lib/python3.7/dist-packages/opennmt/runner.py", line 284, in train
    moving_average_decay=train_config.get("moving_average_decay"),
  File "/usr/local/lib/python3.7/dist-packages/opennmt/training.py", line 122, in __call__
    self._steps(dataset, accum_steps=accum_steps, report_steps=report_steps)
  File "/usr/local/lib/python3.7/dist-packages/opennmt/training.py", line 262, in _steps
    loss = forward_fn()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3040, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal:  cuDNN launch failure : input shape ([1,500,512,1])
	 [[node transformer_base_1/self_attention_decoder_1/self_attention_decoder_layer_6/transformer_layer_wrapper_42/layer_norm_46/FusedBatchNormV3 (defined at /local/lib/python3.7/dist-packages/opennmt/layers/common.py:128) ]]
	 [[Func/gradients/global_norm/write_summary/summary_cond/then/_267/input/_804/_56]]
  (1) Internal:  cuDNN launch failure : input shape ([1,500,512,1])
	 [[node transformer_base_1/self_attention_decoder_1/self_attention_decoder_layer_6/transformer_layer_wrapper_42/layer_norm_46/FusedBatchNormV3 (defined at /local/lib/python3.7/dist-packages/opennmt/layers/common.py:128) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__forward_32883]

Function call stack:
_forward -> _forward

"""

Traceback (most recent call last):
  File "/usr/local/bin/onmt-main", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/opennmt/bin/main.py", line 312, in main
    hvd=hvd,
  File "/usr/local/lib/python3.7/dist-packages/opennmt/runner.py", line 202, in train
    training=True, num_replicas=num_replicas, num_devices=num_devices
  File "/usr/local/lib/python3.7/dist-packages/opennmt/runner.py", line 151, in _finalize_config
    mixed_precision=self._mixed_precision,
  File "/usr/local/lib/python3.7/dist-packages/opennmt/runner.py", line 625, in _auto_tune_batch_size
    "Batch size autotuning failed: all training attempts exited with an error "
RuntimeError: Batch size autotuning failed: all training attempts exited with an error (see last error above). Either there is not enough memory to train this model, or unexpected errors occured. Please try to set a fixed batch size in the training configuration.

After some additional debugging it seem that the problem is that google colab come with CUDA 11.1, but Tensorflow 2.7 require CUDA 11.2

There are no version of Tensorflow that match the one of CUDA. If I try to change the Tensorflow/Ctranslate2 version to 2.4, I get this error when running onmt-main:

/bin/bash: onmt-main: command not found

I’m kind of stuck has changing the CUDA version is a pain in Colab.

guillaumekln · November 12, 2021, 9:12am

Hi,

See:

github.com/googlecolab/colabtools

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

opened 07:35AM - 07 Oct 21 UTC

closed 03:04PM - 07 Oct 21 UTC

tykwon97

bug

Tensorflow Model Maker Object Detection Tutorial Bug I ran the Model Maker Ob…ject Detection Tutorial via Colab. https://colab.research.google.com/drive/1DhxMGuQ9ep9mrfDBrFBx47zmOeEOn9_W#scrollTo=qhl8lqVamEty However, a problem occurred in `model = object_detector.create(train_data, model_spec=spec, batch_size=8, train_whole_model=True, validation_data=validation_data)`. Runs fine on cpu but not on gpu I think the problem is something like below. https://github.com/googlecolab/colabtools/issues/384 ``` Epoch 1/50 --------------------------------------------------------------------------- UnknownError Traceback (most recent call last) <ipython-input-5-187f39c1697e> in <module>() ----> 1 model = object_detector.create(train_data, model_spec=spec, batch_size=8, train_whole_model=True, validation_data=validation_data) 9 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 58 ctx.ensure_initialized() 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, ---> 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e: 62 if name is not None: UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node keras_layer/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/efficientnet-lite0/StatefulPartitionedCall/stem/conv2d/Conv2D}}]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node keras_layer/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/efficientnet-lite0/StatefulPartitionedCall/stem/conv2d/Conv2D}}]] [[Func/cond/then/_3378/input/_6828/_56]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_96849] Function call stack: train_function -> train_function ``` Please solve this problem.

I think you should use the TensorFlow version that is preinstalled in Google Colab, and not try to reinstall it.

SamuelLacombe · November 12, 2021, 6:12pm

Hello,

Thanks for the quick reply. Actually, if I just take colab setup as is I still get the error:

/bin/bash: onmt-main: command not found

guillaumekln · November 12, 2021, 6:26pm

You should still install OpenNMT-tf, but not TensorFlow (which is already installed).

SamuelLacombe · November 12, 2021, 10:43pm

Hello,

Thank you again. I believe it was just my mistake trying to fix the issue, but using the default setup and just installing these:

!pip3 install OpenNMT-tf[tensorflow]

#download sentencepiece
!pip3 install sentencepiece

#plotly
!pip3 install plotly==5.2.1

#Ctranslate2
!pip3 install ctranslate2==2.7.0 #--force-reinstall

Gives the error I copy/pasted in the first post. My usual setup just stopped working all of a sudden. I’m pretty sure it’s related to google colab changes, but I can’t figure out which change…

Best regards,
Samuel

guillaumekln · November 13, 2021, 8:18am

Try installing OpenNMT-tf without [tensorflow].