This was a long awaited release!
CTranslate2 is a custom C++ inference engine for OpenNMT models. It is a complete rewrite of the original CTranslate to make it more extensible, efficient, and fully GPU compatible.
The project aims to be the fastest solution to run OpenNMT models on CPU and GPU and provide advanced control on the memory usage and threading level. It is more generally a place for experimentation around model compression and inference acceleration.
CTranslate2 currently focuses on running standard Transformer models (as in Vaswani et al. 2017) trained with OpenNMT-py or OpenNMT-tf. More model variants could be added in the future.
Please have a look at the README to learn more!