Here are a few benchmarks for Transformer inference with CTranslate2 vs OpenNMT-py. This is a first batch of results, this post might be updated.
Inferences for CTranslate2 are performed with the cli interface (ctranslate2/bin/translate
).
Inferences for OpenNMT-py are performed with the onmt_translate
entry-point.
Speeds are in target tokens per second.
GPU: GTX 1080
Beam size: 4
Batch size 32
Base (en-de) | Medium (en-es) | Big (en-fr) | |
---|---|---|---|
OpenNMT-py (1.0.1) | 1491 | 1032 | 910 |
CTranslate2 | 3078 | 1448 | 1128 |
CTranslate2 (int8) | 2595 | 1578 | 1200 |
(Not sure where the gap with the V100 benchmarks on OpenNMT-py come from.)
Batch size 16
Base (en-de) | Medium (en-es) | Big (en-fr) | |
---|---|---|---|
OpenNMT-py (1.0.1) | 1004 | 706 | 671 |
CTranslate2 | 2693 | 1378 | 1157 |
CTranslate2 (int8) | 1992 | 1397 | 1114 |
Batch size 8
Base (en-de) | Medium (en-es) | Big (en-fr) | |
---|---|---|---|
OpenNMT-py (1.0.1) | 636 | 464 | 454 |
CTranslate2 | 1915 | 1029 | 974 |
CTranslate2 (int8) | 1339 | 980 | 840 |