Hi, I predominantly use CTranslate from OpenNMT and i noticed recent check-ins for enabling GPU support. Is this something that is already available for use or is it something that is being worked on? if the latter, I didnt see any instructions on how to compile with CUDA support and run using GPU. Thank you!
Yes, it is already available for use. I will shortly update the instructions accordingly.
Basically, you just need to have CUDA installed on your system and recompile the project. You can then use the --cuda
flag or, if you use it as a library, set the appropriate argument when building the Translator
.
Also see here:
It is important to know that only matrix multiplications are computed on the GPU so some host<->device transfers are involved. This may result in poor performance depending on your hardware and model size.
Hi @guillaumekln! thats awesome! yea some instructions would be very helpful!
Yes, i see that its only for the matrix math for now. but that is a good start.
actually quantized math (specifically fixed point) is one of the things i want to try using CTranslate. I was hoping Eigen supported that but the only thing i could find was some code in TensorFlow:Eigen to do this. Maybe that would also work for CTranslate to support that and get more throughput.
Another thing i was thinking of was a way to run something like the WMT validation inputs through a model to generate BLEU scores. For now i am just experimenting using single sentences, but the ability to generate an ‘accuracy’ number would really be awesome too!
Cheers
Could you share the TensorFlow:Eigen code you are talking about?
Technically, I have in mind to rely on the cublasGemmEx
function. But the most tricky part is how to convert float
to int8_t
(and the other way) without impacting the model accuracy.
Here is the link to what I found: https://github.com/tensorflow/tensorflow/tree/master/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint
So I played around with the original word2vec.c file and converted that to fixed point. using the google trained word2vec model and the analogy tests, I was able to use 8-bit fixed point with only a 0.2% loss in accuracy.
However, there were some issues. Those tests needed normalised vectors. Firstly, that meant that all the values were by definition below 1. So I just had to encode the numbers following the decimal. Also, each vector value was very small. so quantizing them to 8-bit caused a significant drop in accuracy. (because i was only using the last few bits of my fixed point). So to get around that, after they were normalized, I just multiplied them all by 2! so i could effectively use more of the range of my fixed point. so after i did that, i got the 0.2% loss in accuracy despite going from a 32-bit float to a 8-bit fixed point. I’m hoping to replicate that here for NMT.