Proper configuration for server

I’ve been digging around for a while in code integration but it is not clear to me which arguments are necessary. I guess “model” and “ct2_model” are not required at the same time…

You can refer to the PR in which it was introduced.


Does it have ensembling options as Pytorch models have?

“model” -> “models”
“ct2_model” -> “ct2_models”??

Hmm I don’t think ensemble decoding is supported in CTranslate2. Not sure if it’s intended to be supported at some point @guillaumekln?

We don’t have plans to add ensemble decoding in CTranslate2.

How can I load the Ctranslate2 model into cpu only (device option in CLI)? Are there any options in server for inter_threads and intra_threads?

I’ve seen setting gpu option to -1 it executes the translations in CPU.

Furthermore, inter_threads and intra_threads options are harcoded, both to 1. Changing these values to 4 respectively, does’t show the same behaviour as from the python API translate_file.

As stated in Ctranslte2 repository:

For CPU translations, the parameter inter_threads controls the number of batches a Translator instance can process in parallel. The translate_file method automatically takes advantage of this parallelization.
However, extra work may be needed when using the translate_batch method because multiple translations should be started concurrently from Python. If you are using a multithreaded HTTP server, this may already be the case. For other cases, you could use a ThreadPoolExecutor to submit multiple translations

Translate batch is used in the sever but it is not parallelized, therefore inter_threads could only be 1

How did you verify it is not parallelized? Did you send parallel translation requests to the server?

Should parallel translation requests behave as inter_threads?
If I have 10000 sentences to transalte should I send them together to the server or should I send batch of, say 32, to the server?

You should send multiple batch requests in parallel. For example if you set inter_threads to 4, you should send at least 4 translation requests in parallel to fully utilize the server capability.

Sorry, but… where do I set inter_threads to 4 in the server?

This is hardcoded for now (, but you can PR the few changes to make this configurable if you want.