Multiple GPUs on opennmt-tf


(Loek van Kooten) #1

Okay, so I did as suggested and made my system multiboot. I´ve now got a fresh instalaltion of Ubuntu, tensorflow and opennmt-tf. The sample files are working, but they are using only one GPU, while tensorflow detects two (SLI). -gpuid doesn’t seem to work in opennmt-tf. How do I tell opennmt to use both GPUś?

(Loek van Kooten) #2

Wait, it is using two GPUs automatically, it seems.

(Guillaume Klein) #3

Multi GPU is only provided via distributed training for the moment. It is inefficient and somewhat hard to setup.

May I ask why you don’t use OpenNMT, the original Lua version?

(Loek van Kooten) #4

I liked the cool graphics of TensorBoard, I wanted to make use of both GPU’s and I thought Tensorflow was the hippest thing in town these days. Not correct?

(Guillaume Klein) #5

True but that does not mean every applications using TensorFlow are the best suited for your use case.

We still consider OpenNMT-tf experimental and its current design targets advanced users. On the other hand, a lot of people, including non technical people, had success setting up and using OpenNMT-lua. I would recommend this one for now.

(Loek van Kooten) #6

Thank you, Guillaume! In fact I already set up lua succesfully before this. That all worked, but training took 59 hours (4,5 million word corpus). So I wanted to try two things at once: get more insight in what is actually happening in the network using TensorBoard and take advantage of the extra GPU power by using a dual-boot system instead of VirtualBox. After 8 hours of training though (everything is working nicely now in Tensorflow, thanks to your help!) it seems the GPU’s won’t work miracles. They are being recognized and used, but considering the quality of the translations of the current model, I still have a long way to go.

(Mohammed Ayub) #7

Hi @guillaumekln ,

just a follow up on your comment :

Do you have the steps to setup the distributed training on HDFS ?

Thanks !

Mohammed Ayub

(Guillaume Klein) #8


First, please note that this comment is no longer true as it’s been quite some time that multi GPU on a local machine is supported.

For HDFS support, I can only point to the TensorFlow documentation:

(Mohammed Ayub) #9

Sounds good.