We are happy to release v3.4.3 with very fast beam search inference.
In essence we more than doubled the inference speed.
v3.0.3 (Feb 2023)
v3.3.0 (June 2023)
v3.4.3 (Nov 2023)
All these numbers were run on a RTX4090 for a vanilla EN-DE base transformer.
The test set is 3003 sentences from WMT14, using a beam_size of 4.
A few comments:
The reported tok/sec is calculated out of the translator it does not count for:
- Python interpreter loading / terminate (about 1.5 sec on my system)
- Model loading (0.4 sec on my system)
I ran the same with CT2:
with a batch size of 960 examples, it takes 2.3 sec. To be fair we need to remove the python loading/termination (so 6.1 sec - 1.5 sec = 4.6 sec)
So OpenNMT-py is still twice slower than CT2 and 3 times slower at batch size 32.