I just see nvidia also has optimized tool, very similiar like ctranslate2. their samples are mostly based on opennmt, anyone know the relation?
FasterTransformer is a demo on how to run Transformer models with custom CUDA code. They just happen to use OpenNMT-tf for the translation task.
CTranslate2 has the same goal of accelerating Transformer models but comes with more features (notably CPU execution) and is more practical to integrate in real world applications.
IC, thanks for the clarification.