CTranslate2: accelerate inference of OpenNMT models

This was a long awaited release!

CTranslate2 is a custom C++ inference engine for OpenNMT models. It is a complete rewrite of the original CTranslate to make it more extensible, efficient, and fully GPU compatible.

The project aims to be the fastest solution to run OpenNMT models on CPU and GPU and provide advanced control on the memory usage and threading level. It is more generally a place for experimentation around model compression and inference acceleration.

CTranslate2 currently focuses on running standard Transformer models (as in Vaswani et al. 2017) trained with OpenNMT-py or OpenNMT-tf. More model variants could be added in the future.

Please have a look at the README to learn more!

10 Likes

This is awesome! Thank you!

I was thinking about this while on holiday last week. It’s fantastic to come back and find it’s done. Congratulations!

Hi @guillaumekln, Before I get started with this could you tell me please if the conversion models are contained in the Docker images. Thanks.
Terence

Yes they are but if you are converting, say, a OpenNMT-tf model, you will also need a TensorFlow installation which is not in the Docker image.