Proper configuration for server

Hi,
I’ve been digging around for a while in code integration but it is not clear to me which arguments are necessary. I guess “model” and “ct2_model” are not required at the same time…
Thanks

Hi there,

You can refer to the PR in which it was introduced.

Thanks!

Does it have ensembling options as Pytorch models have?

“model” -> “models”
“ct2_model” -> “ct2_models”??

Hmm I don’t think ensemble decoding is supported in CTranslate2. Not sure if it’s intended to be supported at some point @guillaumekln?

We don’t have plans to add ensemble decoding in CTranslate2.

Ok. Thanks!

Hi,

How can I load the Ctranslate2 model into cpu only (device option in CLI)? Are there any options in server for inter_threads and intra_threads?

EDIT:
I’ve seen setting gpu option to -1 it executes the translations in CPU.

Furthermore, inter_threads and intra_threads options are harcoded, both to 1. Changing these values to 4 respectively, does’t show the same behaviour as from the python API translate_file.

As stated in Ctranslte2 repository:

For CPU translations, the parameter inter_threads controls the number of batches a Translator instance can process in parallel. The translate_file method automatically takes advantage of this parallelization.
However, extra work may be needed when using the translate_batch method because multiple translations should be started concurrently from Python. If you are using a multithreaded HTTP server, this may already be the case. For other cases, you could use a ThreadPoolExecutor to submit multiple translations

Translate batch is used in the sever but it is not parallelized, therefore inter_threads could only be 1

How did you verify it is not parallelized? Did you send parallel translation requests to the server?

Should parallel translation requests behave as inter_threads?
If I have 10000 sentences to transalte should I send them together to the server or should I send batch of, say 32, to the server?

You should send multiple batch requests in parallel. For example if you set inter_threads to 4, you should send at least 4 translation requests in parallel to fully utilize the server capability.

Sorry, but… where do I set inter_threads to 4 in the server?

This is hardcoded for now (https://github.com/OpenNMT/OpenNMT-py/blob/dfdd4554cf1c9327c2c249ab11a978efe72a93d9/onmt/translate/translation_server.py#L91), but you can PR the few changes to make this configurable if you want.

Hello, I was trying to deploy a ctranslate2 model created by converting an opennmt-py model, but it seems that for it to work it does require the path of the original opennmt-py model (inside the server config). While testing the speed, after adding the ctranslate2 model, the speed didn’t increase at all.

Here is the current config I am using -

    {
        "id": 5,
        "model": "en-ur80.pt",
        "timeout": 600,
        "on_timeout": "to_cpu",
        "load": false,
        "opt": {
            "batch_size": 1,
            "beam_size": 5,
            "replace_unk": true
        },
        "custom_opt": {
            "slc": "en",
            "tlc": "ur"
        },
        "ct2_model": "urt"
    },

Any idea of what is going wrong here?

@guillaumekln Any idea about this?

It’s probably because CTranslate2 is configured to use 1 thread by default, while PyTorch uses more threads. I propose an improvement to the default configuration here:

1 Like

Hi Anurag!

Yes, this was an issue I reported earlier and Guillaume was so kind to investigate it.

FYI, I ended up using CTranslate2 directly, as you can see in this tutorial:

In some cases, e.g. if you use a non-Python web interface, you can build a simple REST API with Flask or FastAPI, that receives requests from the web interface, uses CTranslate2 as an inference, and sends back the translation(s) to the web interface.

However, for demonstration purposes, I assume using CTranslate2 only should be enough as in the aforementioned tutorial.

Kind regards,
Yasmin

3 Likes

For what it’s worth I am also using CTranslate2 within a Flask server configuration for in-house purposes.
Terence

1 Like