CUDA translation support

Hi, I predominantly use CTranslate from OpenNMT and i noticed recent check-ins for enabling GPU support. Is this something that is already available for use or is it something that is being worked on? if the latter, I didnt see any instructions on how to compile with CUDA support and run using GPU. Thank you!

Yes, it is already available for use. I will shortly update the instructions accordingly.

Basically, you just need to have CUDA installed on your system and recompile the project. You can then use the --cuda flag or, if you use it as a library, set the appropriate argument when building the Translator.

Also see here:

It is important to know that only matrix multiplications are computed on the GPU so some host<->device transfers are involved. This may result in poor performance depending on your hardware and model size.

Hi @guillaumekln! thats awesome! yea some instructions would be very helpful!

Yes, i see that its only for the matrix math for now. but that is a good start.

actually quantized math (specifically fixed point) is one of the things i want to try using CTranslate. I was hoping Eigen supported that but the only thing i could find was some code in TensorFlow:Eigen to do this. Maybe that would also work for CTranslate to support that and get more throughput.

Another thing i was thinking of was a way to run something like the WMT validation inputs through a model to generate BLEU scores. For now i am just experimenting using single sentences, but the ability to generate an ‘accuracy’ number would really be awesome too!

Cheers

Could you share the TensorFlow:Eigen code you are talking about?

Technically, I have in mind to rely on the cublasGemmEx function. But the most tricky part is how to convert float to int8_t (and the other way) without impacting the model accuracy.

Here is the link to what I found: https://github.com/tensorflow/tensorflow/tree/master/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint

So I played around with the original word2vec.c file and converted that to fixed point. using the google trained word2vec model and the analogy tests, I was able to use 8-bit fixed point with only a 0.2% loss in accuracy.

However, there were some issues. Those tests needed normalised vectors. Firstly, that meant that all the values were by definition below 1. So I just had to encode the numbers following the decimal. Also, each vector value was very small. so quantizing them to 8-bit caused a significant drop in accuracy. (because i was only using the last few bits of my fixed point). So to get around that, after they were normalized, I just multiplied them all by 2! so i could effectively use more of the range of my fixed point. so after i did that, i got the 0.2% loss in accuracy despite going from a 32-bit float to a 8-bit fixed point. I’m hoping to replicate that here for NMT.