How to change the data parallelism to model parallelsim?

KelleyYin · November 12, 2018, 8:25am

The data parallelism have been used in this program, which is a best option to accelerate speed of training.
However, I have to build two models in my program. so the CUDA memory could’t contain them.
Therefore , I want to distribute them to different gpus for training. The final loss contain of two parts, one part is respectively from two models and another is jointly produced by two models .

How to implement this function in this program ?

guillaumekln · November 19, 2018, 8:41am

What did you try so far? I think this should just be about carefully placing and moving tensors on the correct device.

KelleyYin · November 19, 2018, 11:22am

You are right. I need to sent the two models to different GPUs and distribute the data to corresponding devices.
In the last, I also need to move the losses on different GPUs to a same GPU and add up them for back-propagation.