Improving the gpuid option

guillaumekln · January 17, 2017, 2:23pm

There were discussions to improve the -gpuid option as it can be confusing, especially when using multiple GPUs. See for example:

https://github.com/OpenNMT/OpenNMT/issues/58

What policy should we adopt?

My opinion is to convert this option to a boolean flag: simply -gpu for example.

When you have a single GPU (the most frequent use case), -gpuid already acts as a boolean.
When you have multiple GPUs, you generally want to use CUDA_VISIBLE_DEVICES to not allocate memory on every GPUs. Then you usually set -gpuid 1 and it also acts as a boolean.

Should we go this way? Obviously it is a breaking change but we are still allowing that per semantic versioning.

srush · January 17, 2017, 4:20pm

Hmm, I’m not sure I like this.

Do we need to break -gpuid? -gpuid allocates a very small amount of memory on the other GPUs. My group has been fine using -gpuid 2 etc. I know CUDA_VISIBLE_DEVICES is better, but do we need to break the current option?

guillaumekln · January 17, 2017, 5:10pm

Or should we drop the use of CUDA_VISIBLE_DEVICES and support a list of comma-separated identifiers in -gpuid as I proposed in the issue?

That being said, not using CUDA_VISIBLE_DEVICES also spams the output of nvidia-smi on multi GPU servers. Not so nice…

@jean.senellart What do you think?

jean.senellart · January 17, 2017, 6:49pm

the amount is not so small - it is about 250Mo which means that with 8 processes on 8 GPU we are wasting 2Gb memory which is too high.
I was about to say that we can hide CUDA_VISIBLE_DEVICE by using torch.setenv before loading cutorch but there is something new: probably connected to the automatic use of THC_ALLOCATOR in torch - there is no more default memory crunch on recent versions of Torch. We have a first 297M use only on the GPU we use, and only at the first mem allocation - @guillaumekln, can you double check?

In that case, let us drop completely CUDA_VISIBLE_DEVICES, but also -nparallel and we can extend -gpuid syntax to -gpuid ID1,ID2,ID..n. There is a little bit of work to do in Parallel.lua though.

guillaumekln · January 18, 2017, 8:37am

Indeed, recent Torch versions no more allocate memory on every GPUs, with or without THC_CACHING_ALLOCATOR enabled. Here was my simple test:

require('cutorch')
cutorch.setDevice(1)
local t = torch.Tensor(1):cuda()
io.read()

So this is a good news and we can now support a list of comma-separated identifiers with -gpuid.