Loading models from the memory

didactics · September 29, 2022, 6:08pm

Hi guys,

thanks for your hard work on CTranslate2.

In our API we have to load models from disk at every request:
translator = ctranslate2.Translator(translation_model_path)

Would that be possible to load the model to memory only once and then have the translator be pointed to it in the memory, use a memory stream etc.? That would save us from the initial disk read delay.

Thanks,
Sergei

guillaumekln · September 30, 2022, 7:42am

Hi,

You should design your API to build the translator object once and use the same object in all API calls.

maxiek0071 · September 30, 2022, 9:47am

Hi Guillaume,

Thanks for your reply.

In my application I have about 20 different models, and the Translator object function requires the model_path, which is different for every model. Would you know how to reuse the same Translator object, if all parameters are identical with the exception of the model_path?

I appreciate your reply!

guillaumekln · September 30, 2022, 9:56am

Of course if you have multiple models you should create multiple Translator instances. However, you should try to create only one translator per model during the lifetime of your application.

For example you can store the translators in a dictionary mapping language pairs to Translator instances. When your API is called, you can lookup this dictionary to get the corresponding ready-to-use translator.

didactics · September 30, 2022, 11:33am

Thanks a lot, Guillaume!

didactics · September 30, 2022, 6:53pm

As a feature request, please consider adding a Translator constructor taking an input stream/ input stream reader as not all servers have a local disk attached.

In our particular case, we store our models in a GCS bucket, and have to download them. With an input stream, we would load a new Translator with them directly from the storage.

guillaumekln · September 30, 2022, 7:51pm

Thanks for the request.

Can you specify in more details how this input stream look like? What kind of structure or Python type do you get when streaming from GCS?

didactics · September 30, 2022, 8:22pm

Unfortunately I’m no python expert here, but these links might be helpful:

https://googleapis.dev/python/storage/latest/blobs.html

guillaumekln · October 3, 2022, 9:14am

For reference this is already possible with the C++ API which offers a way to customize how the model files are read. It looks like the GCS C++ API could fit nicely in this usage.

github.com

OpenNMT/CTranslate2/blob/v2.23.0/include/ctranslate2/models/model_reader.h

#pragma once

#include <istream>
#include <memory>
#include <string>

namespace ctranslate2 {
  namespace models {

    // The ModelReader interface allows user code to customize how and where to read model files.
    class ModelReader {
    public:
      virtual ~ModelReader() = default;

      // Returns a string identifying the model to be loaded (e.g. its path on disk).
      virtual std::string get_model_id() const = 0;
      // Returns a stream over a file included in the model, or nullptr if the file can't be openned.
      virtual std::unique_ptr<std::istream> get_file(const std::string& filename,
                                                     const bool binary = false) = 0;

This file has been truncated. show original

We can consider bringing a similar functionality to Python.

guillaumekln · January 2, 2023, 12:58pm

There is now a way to load models from memory in the latest version 3.3.0.

See the Python test which can be used as an example. The file content can either be a bytes object, or a binary file-like object (e.g. io.BytesIO).

github.com

OpenNMT/CTranslate2/blob/v3.3.0/python/tests/test_translator.py#L663-L675


      
          @pytest.mark.parametrize("as_file_object", [True, False])
          def test_load_model_from_memory(as_file_object):
              model_path = _get_model_path()
              files = {}
          
          
    for filename in os.listdir(model_path):
                  with open(os.path.join(model_path, filename), "rb") as model_file:
                      content = model_file.read()
                      if as_file_object:
                          content = io.BytesIO(content)
                      files[filename] = content
          
          
    translator = ctranslate2.Translator("aren-transliteration", files=files)