Using multiple gpus in training.py


(Vincent Vandeghinste) #1

Hi,
I have a machine with 2 gpus.
I have set
export CUDA_VISIBLE_DEVICES=0.1

When I run
python train.py -data /home/jerom/INT/Resources/Bijbel/preprocess_data -save_model /home/jerom/INT/Resources/Bijbel/rembrandt -encoder_type brnn -world_size 1 -gpu_ranks 0
this works fine, and is using gpu nr 0, as witnessed by nvidia-smi.

But when I run
python train.py -data /home/jerom/INT/Resources/Bijbel/preprocess_data -save_model /home/jerom/INT/Resources/Bijbel/rembrandt -encoder_type brnn -world_size 2 -gpu_ranks 0 1

I get:
Traceback (most recent call last):
File “train.py”, line 118, in
main(opt)
File “train.py”, line 35, in main
mp = torch.multiprocessing.get_context(‘spawn’)
AttributeError: ‘module’ object has no attribute ‘get_context’

Any help? thanks


(Eva) #2

You should have:

export CUDA_VISIBLE_DEVICES=0,1

using ‘,’ instead of ‘.’ to separate devices ids.

This may solve your problem.


(Vincent Vandeghinste) #3

tx, but it was a typo in my reporting of the error, not in the actual commands.
So the problem is still there.


(Eva) #4

Are you using python 2.7 or python 3?
Multi-gpu training is only supported by python3:


(Vincent Vandeghinste) #5

thanks, that was it :wink: now it works.