Hi,
I’m trying to run OpenNMT-py on an RTX 3090 from vast.ai and getting a CUDA error:
Traceback (most recent call last):
File "/home/argosopentech/env/bin/onmt_train", line 11, in <module>
load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_train')()
File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 172, in main
train(opt)
File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 157, in train
train_process(opt, device_id=0)
File "/home/argosopentech/OpenNMT-py/onmt/train_single.py", line 109, in main
trainer.train(
File "/home/argosopentech/OpenNMT-py/onmt/trainer.py", line 224, in train
for i, (batches, normalization) in enumerate(
File "/home/argosopentech/OpenNMT-py/onmt/trainer.py", line 166, in _accum_batches
num_tokens = batch.tgt[1:, :, 0].ne(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I’m using a nvidia/cuda:11.3.0-devel-ubuntu20.04
Docker container and installing OpenNMT-py from source:
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:47:00.0 Off | N/A |
| 0% 24C P8 18W / 375W | 1MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The issue looks related to this PyTorch issue, but I’m using a newer graphics card than the people in that issue. OpenNMT-py uses torch>=1.6.0
and the newest version of torch is 1.9.0
is that the issue?