Proper configuration for server

anderleich · June 3, 2020, 12:04pm

Hi,
I’ve been digging around for a while in code integration but it is not clear to me which arguments are necessary. I guess “model” and “ct2_model” are not required at the same time…
Thanks

francoishernandez · June 3, 2020, 12:05pm

Hi there,

You can refer to the PR in which it was introduced.

anderleich · June 3, 2020, 12:08pm

Thanks!

Does it have ensembling options as Pytorch models have?

“model” -> “models”
“ct2_model” -> “ct2_models”??

francoishernandez · June 3, 2020, 12:12pm

Hmm I don’t think ensemble decoding is supported in CTranslate2. Not sure if it’s intended to be supported at some point @guillaumekln?

guillaumekln · June 3, 2020, 12:14pm

We don’t have plans to add ensemble decoding in CTranslate2.

anderleich · June 3, 2020, 12:15pm

Ok. Thanks!

anderleich · October 26, 2020, 11:14am

Hi,

How can I load the Ctranslate2 model into cpu only (device option in CLI)? Are there any options in server for inter_threads and intra_threads?

EDIT:
I’ve seen setting gpu option to -1 it executes the translations in CPU.

Furthermore, inter_threads and intra_threads options are harcoded, both to 1. Changing these values to 4 respectively, does’t show the same behaviour as from the python API translate_file.

As stated in Ctranslte2 repository:

For CPU translations, the parameter inter_threads controls the number of batches a Translator instance can process in parallel. The translate_file method automatically takes advantage of this parallelization.
However, extra work may be needed when using the translate_batch method because multiple translations should be started concurrently from Python. If you are using a multithreaded HTTP server, this may already be the case. For other cases, you could use a ThreadPoolExecutor to submit multiple translations

Translate batch is used in the sever but it is not parallelized, therefore inter_threads could only be 1

guillaumekln · October 26, 2020, 8:26pm

How did you verify it is not parallelized? Did you send parallel translation requests to the server?

anderleich · November 2, 2020, 9:47am

Should parallel translation requests behave as inter_threads?
If I have 10000 sentences to transalte should I send them together to the server or should I send batch of, say 32, to the server?

guillaumekln · November 2, 2020, 12:46pm

You should send multiple batch requests in parallel. For example if you set inter_threads to 4, you should send at least 4 translation requests in parallel to fully utilize the server capability.

anderleich · November 2, 2020, 1:25pm

Sorry, but… where do I set inter_threads to 4 in the server?

francoishernandez · November 2, 2020, 2:13pm

This is hardcoded for now (https://github.com/OpenNMT/OpenNMT-py/blob/dfdd4554cf1c9327c2c249ab11a978efe72a93d9/onmt/translate/translation_server.py#L91), but you can PR the few changes to make this configurable if you want.

ab585 · October 17, 2021, 11:31pm

Hello, I was trying to deploy a ctranslate2 model created by converting an opennmt-py model, but it seems that for it to work it does require the path of the original opennmt-py model (inside the server config). While testing the speed, after adding the ctranslate2 model, the speed didn’t increase at all.

Here is the current config I am using -

    {
        "id": 5,
        "model": "en-ur80.pt",
        "timeout": 600,
        "on_timeout": "to_cpu",
        "load": false,
        "opt": {
            "batch_size": 1,
            "beam_size": 5,
            "replace_unk": true
        },
        "custom_opt": {
            "slc": "en",
            "tlc": "ur"
        },
        "ct2_model": "urt"
    },

Any idea of what is going wrong here?

ab585 · October 20, 2021, 11:33pm

@guillaumekln Any idea about this?

guillaumekln · October 21, 2021, 8:10am

It’s probably because CTranslate2 is configured to use 1 thread by default, while PyTorch uses more threads. I propose an improvement to the default configuration here:

ymoslem · October 21, 2021, 4:51pm

Hi Anurag!

Yes, this was an issue I reported earlier and Guillaume was so kind to investigate it.

FYI, I ended up using CTranslate2 directly, as you can see in this tutorial:

In some cases, e.g. if you use a non-Python web interface, you can build a simple REST API with Flask or FastAPI, that receives requests from the web interface, uses CTranslate2 as an inference, and sends back the translation(s) to the web interface.

However, for demonstration purposes, I assume using CTranslate2 only should be enough as in the aforementioned tutorial.

Kind regards,
Yasmin

tel34 · October 21, 2021, 8:59pm

For what it’s worth I am also using CTranslate2 within a Flask server configuration for in-house purposes.
Terence