Difference between GPU and CPU translation

akocherovskiy · May 24, 2023, 3:07pm

Why do I get different results on the same model when translating with CPU and GPU?
English - Marathi

English: Please speak more slowly
Marathi (CPU): कृपया धीरे धीरे बोल
Marathi (GPU): कृपया आणखी धीरे बोल

English: Yes, a little
Marathi (CPU): होय, थोडे
Marathi (GPU): हो, थोडे

English: We go
Marathi (CPU): आपण जाऊया
Marathi (GPU): आपण जातो

guillaumekln · May 25, 2023, 8:13am

Can you provide more details?

What framework do you use? What’s the model and options?

akocherovskiy · June 1, 2023, 3:41pm

I apologize for the long reply. We decided to do more tests to make sure the experiment is clean.
We use OpenNMT-tf 2.31.0
Based on our experiments we came to the conclusion that the difference in translation on CPU and GPU occurs when we use the quantization format when loading the model - “int8” (Compute type).

guillaumekln · June 1, 2023, 3:59pm

This is mostly expected.

Numerical results will necessarily differ between the CPU and GPU runtime, especially when running in reduced precision like int8 because small numerical variations can be amplified by the quantization.

Only running the model in full precision (float32) can minimize the translation differences.

akocherovskiy · June 1, 2023, 4:19pm

Thanks for your reply