Multiple GPUs on opennmt-tf

Loek · November 16, 2017, 5:08pm

Okay, so I did as suggested and made my system multiboot. I´ve now got a fresh instalaltion of Ubuntu, tensorflow and opennmt-tf. The sample files are working, but they are using only one GPU, while tensorflow detects two (SLI). -gpuid doesn’t seem to work in opennmt-tf. How do I tell opennmt to use both GPUś?

Loek · November 16, 2017, 5:10pm

Wait, it is using two GPUs automatically, it seems.

guillaumekln · November 17, 2017, 8:28am

Multi GPU is only provided via distributed training for the moment. It is inefficient and somewhat hard to setup.

May I ask why you don’t use OpenNMT, the original Lua version?

Loek · November 18, 2017, 4:15pm

I liked the cool graphics of TensorBoard, I wanted to make use of both GPU’s and I thought Tensorflow was the hippest thing in town these days. Not correct?

guillaumekln · November 20, 2017, 8:37am

True but that does not mean every applications using TensorFlow are the best suited for your use case.

We still consider OpenNMT-tf experimental and its current design targets advanced users. On the other hand, a lot of people, including non technical people, had success setting up and using OpenNMT-lua. I would recommend this one for now.

Loek · November 20, 2017, 8:51am

Thank you, Guillaume! In fact I already set up lua succesfully before this. That all worked, but training took 59 hours (4,5 million word corpus). So I wanted to try two things at once: get more insight in what is actually happening in the network using TensorBoard and take advantage of the extra GPU power by using a dual-boot system instead of VirtualBox. After 8 hours of training though (everything is working nicely now in Tensorflow, thanks to your help!) it seems the GPU’s won’t work miracles. They are being recognized and used, but considering the quality of the translations of the current model, I still have a long way to go.

mayub · November 12, 2018, 8:11pm

Hi @guillaumekln ,

just a follow up on your comment :

Do you have the steps to setup the distributed training on HDFS ?

Thanks !

Mohammed Ayub

guillaumekln · November 13, 2018, 8:17am

Hi,

First, please note that this comment is no longer true as it’s been quite some time that multi GPU on a local machine is supported.

For HDFS support, I can only point to the TensorFlow documentation:

https://www.tensorflow.org/deploy/hadoop

mayub · November 14, 2018, 1:50pm

Sounds good.