OpenNMT Forum

Proper configuration for server

Hi,
I’ve been digging around for a while in code integration but it is not clear to me which arguments are necessary. I guess “model” and “ct2_model” are not required at the same time…
Thanks

Hi there,

You can refer to the PR in which it was introduced.

Thanks!

Does it have ensembling options as Pytorch models have?

“model” -> “models”
“ct2_model” -> “ct2_models”??

Hmm I don’t think ensemble decoding is supported in CTranslate2. Not sure if it’s intended to be supported at some point @guillaumekln?

We don’t have plans to add ensemble decoding in CTranslate2.

Ok. Thanks!

Hi,

How can I load the Ctranslate2 model into cpu only (device option in CLI)? Are there any options in server for inter_threads and intra_threads?

EDIT:
I’ve seen setting gpu option to -1 it executes the translations in CPU.

Furthermore, inter_threads and intra_threads options are harcoded, both to 1. Changing these values to 4 respectively, does’t show the same behaviour as from the python API translate_file.

As stated in Ctranslte2 repository:

For CPU translations, the parameter inter_threads controls the number of batches a Translator instance can process in parallel. The translate_file method automatically takes advantage of this parallelization.
However, extra work may be needed when using the translate_batch method because multiple translations should be started concurrently from Python. If you are using a multithreaded HTTP server, this may already be the case. For other cases, you could use a ThreadPoolExecutor to submit multiple translations

Translate batch is used in the sever but it is not parallelized, therefore inter_threads could only be 1

How did you verify it is not parallelized? Did you send parallel translation requests to the server?

Should parallel translation requests behave as inter_threads?
If I have 10000 sentences to transalte should I send them together to the server or should I send batch of, say 32, to the server?

You should send multiple batch requests in parallel. For example if you set inter_threads to 4, you should send at least 4 translation requests in parallel to fully utilize the server capability.

Sorry, but… where do I set inter_threads to 4 in the server?

This is hardcoded for now (https://github.com/OpenNMT/OpenNMT-py/blob/dfdd4554cf1c9327c2c249ab11a978efe72a93d9/onmt/translate/translation_server.py#L91), but you can PR the few changes to make this configurable if you want.