Multiple GPUs on opennmt-tf

Okay, so I did as suggested and made my system multiboot. I´ve now got a fresh instalaltion of Ubuntu, tensorflow and opennmt-tf. The sample files are working, but they are using only one GPU, while tensorflow detects two (SLI). -gpuid doesn’t seem to work in opennmt-tf. How do I tell opennmt to use both GPUś?

Wait, it is using two GPUs automatically, it seems.

Multi GPU is only provided via distributed training for the moment. It is inefficient and somewhat hard to setup.

May I ask why you don’t use OpenNMT, the original Lua version?

I liked the cool graphics of TensorBoard, I wanted to make use of both GPU’s and I thought Tensorflow was the hippest thing in town these days. Not correct?

True but that does not mean every applications using TensorFlow are the best suited for your use case.

We still consider OpenNMT-tf experimental and its current design targets advanced users. On the other hand, a lot of people, including non technical people, had success setting up and using OpenNMT-lua. I would recommend this one for now.

Thank you, Guillaume! In fact I already set up lua succesfully before this. That all worked, but training took 59 hours (4,5 million word corpus). So I wanted to try two things at once: get more insight in what is actually happening in the network using TensorBoard and take advantage of the extra GPU power by using a dual-boot system instead of VirtualBox. After 8 hours of training though (everything is working nicely now in Tensorflow, thanks to your help!) it seems the GPU’s won’t work miracles. They are being recognized and used, but considering the quality of the translations of the current model, I still have a long way to go.

Hi @guillaumekln ,

just a follow up on your comment :

Do you have the steps to setup the distributed training on HDFS ?

Thanks !

Mohammed Ayub

Hi,

First, please note that this comment is no longer true as it’s been quite some time that multi GPU on a local machine is supported.

For HDFS support, I can only point to the TensorFlow documentation:

https://www.tensorflow.org/deploy/hadoop

Sounds good.