I’m slowly making my way through all the learning from tokenization/modeling/serving.
I succeed all the first steps, but now I’m at the point where I have a solid web page which can use my model through ctranslate2, but I’m not able to support more than 1 model as there are size restriction on web page and the model have to be preloaded. So in someway I need to create an API somewhere that will handle my multiple models and have my web interface call that API.
Is my understanding correct to believe that nmt-wizard-docker can do just exactly that?
Could I put this into Google Cloud and make an API?
I have 0 experience with API/docker… I actually learn what was a docker today… I’m not going to flood the forum with questions, but I just want to validate that I’m not already searching in the wrong direction.
Hi Samuel, I haven’t tried this set-up with ctranslate2 but TensorFlow allows the serving of multiple models via Tensorflow Serving Configuration | TFX | TensorFlow. All the models on http://nmtgateway.com (very old models now) are served this way. All loaded at the same time on a consumer-grade PC in my workroom.
Due to some subtle issues with the way the above-mentioned server works with CTrasnlate2, I ended up building a REST API with Flask. I also have experimented with FastAPI. I think I can fine-tune and publish the code later.
So for now, here are two suggestions if you want to use an API:
You try out the above-mentioned REST API; it might work well for your case; if not,
It’s been a little while, but I just happened to finish the API lately.
Now, I’m having issues to handles multiples models. Flask-cache doesn’t seem to handle the translator from ctranslate2.
I’m not finding any solution, at this time, to handles multiples models with caching. (for efficiency) Obviously, my code is handling multiples models, but the API is loading the model required on every call.