So If I run the command directly in the Jupyter notebook it’s working, but it’s failing whit this error:
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
1]<stderr>:2022-11-15 16:04:28.990539: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:28.990788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
[1]<stderr>:pciBusID: 0000:04:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
[1]<stderr>:coreClock: 1.725GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
[1]<stderr>:2022-11-15 16:04:28.990851: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:28.991095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
[1]<stderr>:pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
[1]<stderr>:coreClock: 1.725GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
and
1]<stderr>:2022-11-15 16:04:28.995408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:28.995600: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:28.995726: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
[1]<stderr>:2022-11-15 16:04:28.995778: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:28.995855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
[1]<stderr>:2022-11-15 16:04:28.995956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:28.996092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
[0]<stderr>:2022-11-15 16:04:28.997044: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
[0]<stderr>:2022-11-15 16:04:28.997317: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
[0]<stderr>:2022-11-15 16:04:28.997394: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
[0]<stderr>:2022-11-15 16:04:28.997447: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:28.997614: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:28.997770: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:28.997928: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:28.998050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
[1]<stderr>:2022-11-15 16:04:29.010274: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[0]<stderr>:2022-11-15 16:04:29.011479: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[0]<stderr>:2022-11-15 16:04:29.241000: I main.py:314] Using OpenNMT-tf version 2.21.0
and
1]<stderr>:2022-11-15 16:04:29.245470: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
[1]<stderr>:To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[1]<stderr>:2022-11-15 16:04:29.245821: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[1]<stderr>:2022-11-15 16:04:29.245948: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:29.246824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
[1]<stderr>:pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
[1]<stderr>:coreClock: 1.725GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
[1]<stderr>:2022-11-15 16:04:29.246840: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[1]<stderr>:2022-11-15 16:04:29.246862: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
[1]<stderr>:2022-11-15 16:04:29.246871: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
[1]<stderr>:2022-11-15 16:04:29.246878: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
[1]<stderr>:2022-11-15 16:04:29.246884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
[1]<stderr>:2022-11-15 16:04:29.246891: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
[1]<stderr>:2022-11-15 16:04:29.246897: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
[1]<stderr>:2022-11-15 16:04:29.246904: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
[1]<stderr>:2022-11-15 16:04:29.246965: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:29.247180: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:29.247450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 1
[1]<stderr>:2022-11-15 16:04:29.707475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
[1]<stderr>:2022-11-15 16:04:29.707512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 1
[1]<stderr>:2022-11-15 16:04:29.707520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 1: N
[1]<stderr>:2022-11-15 16:04:29.707692: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:29.707854: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:29.707989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[1]<stderr>:2022-11-15 16:04:29.708104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22378 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:0a:00.0, compute capability: 8.6)
[1]<stderr>:2022-11-15 16:04:29.708000: I main.py:323] Searching the largest batch size between 256 and 16384 with a precision of 256...
[1]<stderr>:2022-11-15 16:04:29.712000: I main.py:323] Trying training with batch size 8320...
[0]<stderr>:2022-11-15 16:04:29.730524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
[0]<stderr>:2022-11-15 16:04:29.730547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
[0]<stderr>:2022-11-15 16:04:29.730556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
[0]<stderr>:2022-11-15 16:04:29.730739: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:29.730914: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:29.731065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[0]<stderr>:2022-11-15 16:04:29.731194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22378 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:04:00.0, compute capability: 8.6)
[0]<stderr>:2022-11-15 16:04:29.731000: I main.py:323] Searching the largest batch size between 256 and 16384 with a precision of 256...
[0]<stderr>:2022-11-15 16:04:29.735000: I main.py:323] Trying training with batch size 8320...
[1]<stderr>:2022-11-15 16:04:50.086000: I main.py:323] ... failed.
[1]<stderr>:2022-11-15 16:04:50.092000: I main.py:323] Trying training with batch size 4287...
[0]<stderr>:2022-11-15 16:04:50.232000: I main.py:323] ... failed.
[0]<stderr>:2022-11-15 16:04:50.239000: I main.py:323] Trying training with batch size 4287...
[0]<stderr>:2022-11-15 16:05:10.519000: I main.py:323] ... failed.
[0]<stderr>:2022-11-15 16:05:10.524000: I main.py:323] Trying training with batch size 2271...
[1]<stderr>:2022-11-15 16:05:10.540000: I main.py:323] ... failed.
[1]<stderr>:2022-11-15 16:05:10.547000: I main.py:323] Trying training with batch size 2271...
[1]<stderr>:2022-11-15 16:05:30.812000: I main.py:323] ... failed.
[1]<stderr>:2022-11-15 16:05:30.817000: I main.py:323] Trying training with batch size 1263...
[0]<stderr>:2022-11-15 16:05:31.233000: I main.py:323] ... failed.
[0]<stderr>:2022-11-15 16:05:31.238000: I main.py:323] Trying training with batch size 1263...
[0]<stderr>:2022-11-15 16:05:51.488000: I main.py:323] ... failed.
[0]<stderr>:2022-11-15 16:05:51.493000: I main.py:323] Trying training with batch size 759...
[1]<stderr>:2022-11-15 16:05:51.757000: I main.py:323] ... failed.
[1]<stderr>:2022-11-15 16:05:51.762000: I main.py:323] Trying training with batch size 759...
[0]<stderr>:2022-11-15 16:06:12.087000: I main.py:323] ... failed.
[0]<stderr>:2022-11-15 16:06:12.092000: I main.py:323] Trying training with batch size 507...
[1]<stderr>:2022-11-15 16:06:12.462000: I main.py:323] ... failed.
[1]<stderr>:2022-11-15 16:06:12.468000: I main.py:323] Trying training with batch size 507...
[0]<stderr>:2022-11-15 16:06:32.391000: I main.py:323] ... failed.
[0]<stderr>:2022-11-15 16:06:32.392000: E main.py:323] Last training attempt exited with an error:
[0]<stderr>:
[0]<stderr>:"""
[0]<stderr>:2022-11-15 16:06:31.120468: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED