My opinion is to convert this option to a boolean flag: simply -gpu for example.
When you have a single GPU (the most frequent use case), -gpuid already acts as a boolean.
When you have multiple GPUs, you generally want to use CUDA_VISIBLE_DEVICES to not allocate memory on every GPUs. Then you usually set -gpuid 1 and it also acts as a boolean.
Should we go this way? Obviously it is a breaking change but we are still allowing that per semantic versioning.
Do we need to break -gpuid? -gpuid allocates a very small amount of memory on the other GPUs. My group has been fine using -gpuid 2 etc. I know CUDA_VISIBLE_DEVICES is better, but do we need to break the current option?
the amount is not so small - it is about 250Mo which means that with 8 processes on 8 GPU we are wasting 2Gb memory which is too high.
I was about to say that we can hide CUDA_VISIBLE_DEVICE by using torch.setenv before loading cutorch but there is something new: probably connected to the automatic use of THC_ALLOCATOR in torch - there is no more default memory crunch on recent versions of Torch. We have a first 297M use only on the GPU we use, and only at the first mem allocation - @guillaumekln, can you double check?
In that case, let us drop completely CUDA_VISIBLE_DEVICES, but also -nparallel and we can extend -gpuid syntax to -gpuid ID1,ID2,ID..n. There is a little bit of work to do in Parallel.lua though.