GPU is not used



Hey there guillaumekln, I’m trying to run a model using a GPU and I saw from this post that to do so I have to call the -gpuid feature with any ID that’s not 0. I dont get any error when I run the model using the -gpuid with ID being 1 or 2 but I realized that my GPU isnt being used when I verifying with the nvidia-smi command. Im also using the pytorch version. Any pointers?


How to check if onmt-main is using my GPU
(Guillaume Klein) #2


What is the command line you used?


This is the command line that I used.

python -data cmn-eng/demo -save_model bidir_2layer_500u -src_word_vec_size 500 -tgt_word_vec_size 500 -encoder_type brnn -decoder_type rnn -rnn_size 500 -enc_layers 2 -dec_layers 2 -rnn_type LSTM -global_attention general -batch_size 64 -optim adam -adam_beta1 .9 -adam_beta2 .999 -dropout .4 -learning_rate .001 -report_every 200 -train_steps 20000 -gpuid 1

I did see from the documentation that -gpuid is deprecated and I should use -world_size and -gpu_ranks. This is my first time using a GPU to train a model so im not entirely sure what these two features do.



I was able to get it to work. Nonetheless, thank you for the fast response.


(Guillaume Klein) #5

Can you share what you find and how you fixed it for future users? Thanks.


Yeah sure. The big thing that I needed to figure out was what world_size and gpu_ranks meant. I rented out AWS p2.xlarge services that provides 1 GPU and wanted to train a NMT model with about 10mil dataset. According to this post here, to use my GPU, I would first need to set which GPU to use with the command export CUDA_VISIBLE_DEVICES. Once this is done, when writing the command line to train the model, world_size corresponds to the number of GPUs to use and gpu_rank refers to the order in which you want them to be used. Since I had 1 GPU, i had -world_size 1 -gpu_ranks 0 since CUDA_VISIBLE_DEVICES enumerates your GPU with 0. More can be found with this link here.

(Vincent Nguyen) #7

all in here: