OpenNMT Forum

Unable to use -save_checkpoint_steps with gpu enabled



Hi !

On Opennmt-py, the `-save_checkpoint_steps’ option does not work with the -gpu_ranks 1 option enabled. If I switch to plain cpu (by removing the gpu_ranks option) it does work.

Can someone share inputs on how to make it work with the gpu, because on the cpu it takes a really long time.


(Guillaume Klein) #2


You should select the device ID with CUDA_VISIBLE_DEVICES:

CUDA_VISIBLE_DEVICES=1 python [...] -gpu_ranks 0


Thank you for the super-quick response.

When I do as you’ve suggested, I’m getting the below error:
THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=34 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File “”, line 120, in
File “”, line 51, in main
single_main(opt, 0)
File “/home/ubuntu/algo/amit/workspace/OpenNMT-py/onmt/”, line 79, in main
opt = training_opt_postprocessing(opt, device_id)
File “/home/ubuntu/algo/amit/workspace/OpenNMT-py/onmt/”, line 72, in training_opt_postprocessing
File “/home/ubuntu/anaconda3/envs/py36-th/lib/python3.6/site-packages/torch/cuda/”, line 264, in set_device
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/torch/csrc/cuda/Module.cpp:34
I do have a working gpu, so something seems to be amiss. I can tell that because when I run the training in opennmt-py with -gpu_ranks 1, it tells me that its using the GPU, and the training goes pretty fast (about 10 times faster).


(Guillaume Klein) #4

Do you have a single GPU? Then just use -gpu_ranks 0.


Great! Thanks a lot! That worked like a charm.