CTranslate2: accelerate inference of OpenNMT models

guillaumekln · October 9, 2019, 11:43am

This was a long awaited release!

CTranslate2 is a custom C++ inference engine for OpenNMT models. It is a complete rewrite of the original CTranslate to make it more extensible, efficient, and fully GPU compatible.

The project aims to be the fastest solution to run OpenNMT models on CPU and GPU and provide advanced control on the memory usage and threading level. It is more generally a place for experimentation around model compression and inference acceleration.

CTranslate2 currently focuses on running standard Transformer models (as in Vaswani et al. 2017) trained with OpenNMT-py or OpenNMT-tf. More model variants could be added in the future.

Please have a look at the README to learn more!

tom · October 9, 2019, 11:45am

This is awesome! Thank you!

tel34 · October 9, 2019, 12:49pm

I was thinking about this while on holiday last week. It’s fantastic to come back and find it’s done. Congratulations!

tel34 · October 10, 2019, 10:45am

Hi @guillaumekln, Before I get started with this could you tell me please if the conversion models are contained in the Docker images. Thanks.
Terence

guillaumekln · October 10, 2019, 12:06pm

Yes they are but if you are converting, say, a OpenNMT-tf model, you will also need a TensorFlow installation which is not in the Docker image.