Using multiple gpus in training.py

vincent · November 30, 2018, 9:37am

Hi,
I have a machine with 2 gpus.
I have set
export CUDA_VISIBLE_DEVICES=0.1

When I run
python train.py -data /home/jerom/INT/Resources/Bijbel/preprocess_data -save_model /home/jerom/INT/Resources/Bijbel/rembrandt -encoder_type brnn -world_size 1 -gpu_ranks 0
this works fine, and is using gpu nr 0, as witnessed by nvidia-smi.

But when I run
python train.py -data /home/jerom/INT/Resources/Bijbel/preprocess_data -save_model /home/jerom/INT/Resources/Bijbel/rembrandt -encoder_type brnn -world_size 2 -gpu_ranks 0 1

I get:
Traceback (most recent call last):
File “train.py”, line 118, in
main(opt)
File “train.py”, line 35, in main
mp = torch.multiprocessing.get_context(‘spawn’)
AttributeError: ‘module’ object has no attribute ‘get_context’

Any help? thanks

emartinezVic · November 30, 2018, 10:52am

You should have:

export CUDA_VISIBLE_DEVICES=0,1

using ‘,’ instead of ‘.’ to separate devices ids.

This may solve your problem.

vincent · November 30, 2018, 10:56am

tx, but it was a typo in my reporting of the error, not in the actual commands.
So the problem is still there.

emartinezVic · November 30, 2018, 12:09pm

Are you using python 2.7 or python 3?
Multi-gpu training is only supported by python3:

vincent · December 1, 2018, 2:54pm

thanks, that was it now it works.