Loading models from the memory

Hi guys,

thanks for your hard work on CTranslate2.

In our API we have to load models from disk at every request:
translator = ctranslate2.Translator(translation_model_path)

Would that be possible to load the model to memory only once and then have the translator be pointed to it in the memory, use a memory stream etc.? That would save us from the initial disk read delay.

Thanks,
Sergei

Hi,

You should design your API to build the translator object once and use the same object in all API calls.

Hi Guillaume,

Thanks for your reply.

In my application I have about 20 different models, and the Translator object function requires the model_path, which is different for every model. Would you know how to reuse the same Translator object, if all parameters are identical with the exception of the model_path?

I appreciate your reply!

Of course if you have multiple models you should create multiple Translator instances. However, you should try to create only one translator per model during the lifetime of your application.

For example you can store the translators in a dictionary mapping language pairs to Translator instances. When your API is called, you can lookup this dictionary to get the corresponding ready-to-use translator.

1 Like

Thanks a lot, Guillaume!

As a feature request, please consider adding a Translator constructor taking an input stream/ input stream reader as not all servers have a local disk attached.

In our particular case, we store our models in a GCS bucket, and have to download them. With an input stream, we would load a new Translator with them directly from the storage.

Thanks for the request.

Can you specify in more details how this input stream look like? What kind of structure or Python type do you get when streaming from GCS?

Unfortunately I’m no python expert here, but these links might be helpful:

https://googleapis.dev/python/storage/latest/blobs.html

For reference this is already possible with the C++ API which offers a way to customize how the model files are read. It looks like the GCS C++ API could fit nicely in this usage.

We can consider bringing a similar functionality to Python.

1 Like

There is now a way to load models from memory in the latest version 3.3.0.

See the Python test which can be used as an example. The file content can either be a bytes object, or a binary file-like object (e.g. io.BytesIO).