Hi
I am running opennmt-tf from a python notebook.
I have 4 GPUs, therfore I want to run 4 training sessions in parallel, each using one GPU.
I set this:
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"
and even though, it loads GPU ID 0 and then it all crashes.
It seems this is the expected behavior. When setting CUDA_VISIBLE_DEVICES=1, the process will see single GPU with ID 0, but it is actually the GPU 1 on the system.
Hi I did that, but when running the code twice in parallel, and in each run the CUDA_VISIBLE_DEVICES= was set to a different ID then both threads crashed and I understood from the error message that they are getting mixed. Is there another way to ensure it will be separated?
I am running from a notebook and not the command line.