Advice on new GPU

performance

(Panos Kanavos) #1

Hi all,

Now that the new nVidia RTX GPUs are out, I’m thinking on upgrading or add up to my GTX 1080. My main concern with my current GPU is memory, as it is marginally sufficient. So, if I get an RTX 2080 (8 GB) or another 1080, would these extra 8 GB be used in training so I can count on 16 GB total GPU memory pool?

Also, can the current implementations of OpenNMT (and particularly the lua version) take advantage of the Tensor cores on the new RTXs?

Thanks!


(Guillaume Klein) #2

Hi,

This is only partially true for multi GPU trainings. You will be able to train with an effective batch size 2x larger but you won’t be able to train a model 2x larger.

Torch is not longer maintained so new NVIDIA features are not supported. I suggest migrating to PyTorch (closest to Lua in terms of usage) or TensorFlow if you plan to update your GPU cards.


(Terence Lewis) #3

But presumably the Torch toolkit will still be available and accessible “as is”? I ask this because although I am experimenting with TensorFlow I can’t see that replacing our “Lua-based” production set-ups in the next few months. It would be good to know that if a machine fails we can still download Torch.


(Guillaume Klein) #4

Sure, it will still be available.


(jean.senellart) #5

Hello Terence, feel free to let us know any issue/need you may have we have for helping migration to OpenNMT-tf version - on our side, it is now the production environment and goes far beyond OpenNMT-lua. Feel free to DM if you want to discuss details!


(Panos Kanavos) #6

Thanks @guillaumekln, that’s what I thought. It seems I will need a card with 11GB of memory. So, is there support in the tf or pytorch version for the Tensor cores in the new RTXs right now or it can be added sometime in the future?

@jean.senellart: I’d like to start digging deeper in OpenNMT-tf – only experimented with a Transformer model and it seemed to work fine. Still, my first impression of the TensorFlow version is that separation of code and data is not very clear, and this may hinder fast adoption for production use. For this purpose, it would help to have thorough documentation as in the lua version in OpenNMT’s website, with various examples and structured workflows.
As I plan to migrate some time soon, here is a first question: is there something similar to hooks in the tf version or other means to add custom features on-the-fly? I just can’t do without domain control :slight_smile:


(Guillaume Klein) #7

It depends on the implementation of underlying framework, i.e. PyTorch or TensorFlow. I’m not sure if they fully support Tensor cores yet.

What do you mean by data here?

Yes. It does not use the same logic but there are ways to inject external logic in the data preparation phase. But bear in mind that OpenNMT-tf does not come with a REST server like the Lua version. If you want to feel at home, OpenNMT-py is closer to OpenNMT-lua in that regard.


(Vincent Nguyen) #8

Hey Panosk,

Nice to hear from you again.

If you want to buy a GPU, just wait a little bit. No one has seen any cuda / RT / tensor cores tests so far on RTX.
we are all impatient, but i’ll buy one :slight_smile:

-tf and -py are both great frameworks, pytorch does support Tensor cores already. But for this I will need to adapt the code of onmt-py to support FP16 and blablabla … need time…

Anyway as @guillaumekln says, the Rest server is in there already.


(Panos Kanavos) #9

Well, probably I’m too accustomed to the lua version :slight_smile: In tf it seems one has to create or edit python files in order to create custom models, and the parameters can be placed in py files, in yalm files, or in the command line. Sure, I fully understand the flexibility this offers, so I probably need to take a closer and more careful look into the framework to clear things up.

Hi Vincent!

Yes, I don’t plan to buy any RTX right now, I’ll wait for benchmarks – and hopefully any price drops :frowning_face: Problem is that gamers are not very happy with the RTX performance/price and they are buying the remaining 1080 Ti cards, so soon I won’t have that choice.

Eventually I will get my feet wet with both the -py and -tf versions.


(Guillaume Klein) #10

Yes, this comes from the issues we had with the Lua command line: non persistent options, unclear option scope and dependency, too many command line options (but never enough for power users). The proposed design address all of these.