OpenNMT Forum

All CUDA-capable devices are busy or unavailable

torch version 1.6.0+cu101
CUDA version: 10.1
nvidia driver 418.56
ubuntu18.04

here’s my train configure
-data ~~
-save_model ~~
-layers 6
-rnn_size 512
-word_vec_size 512
-transformer_ff 2048
-heads 8
-encoder_type transformer
-decoder_type transformer
-position_encoding
-train_steps 100000
-max_generator_batches 2
-dropout 0.1
-batch_size 1024
-batch_type tokens
-normalization tokens
-accum_count 2
-optim adam
-adam_beta2 0.998
-decay_method noam
-warmup_steps 8000
-learning_rate 2
-max_grad_norm 0
-param_init 0
-param_init_glorot
-label_smoothing 0.1
-valid_steps 1000
-save_checkpoint_steps 1000
-log_file data/${domain}/log/trn.date +%y%m%d_%H%M%S.log
-log_file_level INFO
-exp data/${domain}/exp.txt
-early_stopping 6
-early_stopping_criteria ppl
-tensorboard
-tensorboard_log_dir runs/onmt
-world_size 4
-gpu_ranks 0 1 2 3

The point is gpu problem
When world_size 1 and gpu_ranks 0, train command works
However, when world_size 4 and gpu_ranks 0 1 2 3, I got a following error message
only works use one gpu 0 if i use multi-gpu then didn’t works

File “/data/home/asr/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 605, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable

import torch
torch.cuda.is_available() => TRUE

I saw all gpu available in torch using simple code

Could i hear some hint for this problem?

Your GPUs are probably in “exclusive mode”, which is not compatible with the current producer/consumer scheme happening in multi-gpu setup.

Thanks your reply.
but I checked running 8 gpus at the same time through simple torch code.
so any possible my gpu exclusive mode?

What does nvidia-smi say?

any possibility my nvidia driver set “nvidia-smi -c 2” by any chance?

E. Process means your gpus are in exclusive process mode. This means it can only have one process per GPU. But the current producer/consumer setup has two (one training process per GPU + the producer process that hosts the queues for each GPU).

If you switch to default mode it should work:

1 Like

Thank you!
I tried your advices, It’s works for me.