Nmt-wizard-docker - API

Please bear with me for theses questions…

I’m slowly making my way through all the learning from tokenization/modeling/serving.

I succeed all the first steps, but now I’m at the point where I have a solid web page which can use my model through ctranslate2, but I’m not able to support more than 1 model as there are size restriction on web page and the model have to be preloaded. So in someway I need to create an API somewhere that will handle my multiple models and have my web interface call that API.

  1. Is my understanding correct to believe that nmt-wizard-docker can do just exactly that?
  2. Could I put this into Google Cloud and make an API?

I have 0 experience with API/docker… I actually learn what was a docker today… I’m not going to flood the forum with questions, but I just want to validate that I’m not already searching in the wrong direction.

Thank you!

Hi Samuel, I haven’t tried this set-up with ctranslate2 but TensorFlow allows the serving of multiple models via Tensorflow Serving Configuration  |  TFX  |  TensorFlow. All the models on (very old models now) are served this way. All loaded at the same time on a consumer-grade PC in my workroom.

1 Like

Dear Samuel,

For OpenNMT-py, there is this Simple OpenNMT-py REST server, which supports CTranslate2 as well.

Due to some subtle issues with the way the above-mentioned server works with CTrasnlate2, I ended up building a REST API with Flask. I also have experimented with FastAPI. I think I can fine-tune and publish the code later.

So for now, here are two suggestions if you want to use an API:

  • You try out the above-mentioned REST API; it might work well for your case; if not,
  • You can build something simple with FastAPI (First Steps - FastAPI).

Kind regards,

Thanks to both of you,

I will probably go and try with FastAPI as I really want to stick with ctranslate2. I’m using lots of it features.

Best regards,

This project includes a serving API, but it does not handle multiple models in a single instance.

It’s been a little while, but I just happened to finish the API lately.
Now, I’m having issues to handles multiples models. Flask-cache doesn’t seem to handle the translator from ctranslate2.

I’m getting this error:

TypeError: cannot pickle ‘ctranslate2.translator.Translator’ object

I’m not finding any solution, at this time, to handles multiples models with caching. (for efficiency) Obviously, my code is handling multiples models, but the API is loading the model required on every call.