Hey there guillaumekln, I’m trying to run a model using a GPU and I saw from this post that to do so I have to call the -gpuid feature with any ID that’s not 0. I dont get any error when I run the model using the -gpuid with ID being 1 or 2 but I realized that my GPU isnt being used when I verifying with the nvidia-smi command. Im also using the pytorch version. Any pointers?
I did see from the documentation that -gpuid is deprecated and I should use -world_size and -gpu_ranks. This is my first time using a GPU to train a model so im not entirely sure what these two features do.
Yeah sure. The big thing that I needed to figure out was what world_size and gpu_ranks meant. I rented out AWS p2.xlarge services that provides 1 GPU and wanted to train a NMT model with about 10mil dataset. According to this post here, to use my GPU, I would first need to set which GPU to use with the command export CUDA_VISIBLE_DEVICES. Once this is done, when writing the command line to train the model, world_size corresponds to the number of GPUs to use and gpu_rank refers to the order in which you want them to be used. Since I had 1 GPU, i had -world_size 1 -gpu_ranks 0 since CUDA_VISIBLE_DEVICES enumerates your GPU with 0. More can be found with this link here.