How to change the data parallelism to model parallelsim?


(Kelley Yin) #1

The data parallelism have been used in this program, which is a best option to accelerate speed of training.
However, I have to build two models in my program. so the CUDA memory could’t contain them.
Therefore , I want to distribute them to different gpus for training. The final loss contain of two parts, one part is respectively from two models and another is jointly produced by two models .

How to implement this function in this program ?

(Guillaume Klein) #2

What did you try so far? I think this should just be about carefully placing and moving tensors on the correct device.

(Kelley Yin) #3

You are right. I need to sent the two models to different GPUs and distribute the data to corresponding devices.
In the last, I also need to move the losses on different GPUs to a same GPU and add up them for back-propagation.