I was wondering if there are any resources that can point to how ctranslate2 can be deployed for production use and be scaled accordingly. I am most interested in microbatching, and autoscalling in a k8s environment.
It does not fall under the most common supported ML frameworks category for model serving in frameworks like bentoml/kserve/etc. So I was wondering whether is a better idea to use ray-serve or something similar which is framework agnostic.
Any insights on that topic?