Deploying ctranslate2 on production

Hi Everyone,

I was wondering if there are any resources that can point to how ctranslate2 can be deployed for production use and be scaled accordingly. I am most interested in microbatching, and autoscalling in a k8s environment.

It does not fall under the most common supported ML frameworks category for model serving in frameworks like bentoml/kserve/etc. So I was wondering whether is a better idea to use ray-serve or something similar which is framework agnostic.

Any insights on that topic?


I’m not aware of such resources for CTranslate2 specifically. The API surface of CTranslate2 is relatively simple so it could probably fit in many serving frameworks. Let us know if you encounter any issues during this integration.

Here are some possibly useful features available in CTranslate2: