I’ve been digging around for a while in code integration but it is not clear to me which arguments are necessary. I guess “model” and “ct2_model” are not required at the same time…
You can refer to the PR in which it was introduced.
Does it have ensembling options as Pytorch models have?
“model” -> “models”
“ct2_model” -> “ct2_models”??
Hmm I don’t think ensemble decoding is supported in CTranslate2. Not sure if it’s intended to be supported at some point @guillaumekln?
We don’t have plans to add ensemble decoding in CTranslate2.
How can I load the Ctranslate2 model into cpu only (device option in CLI)? Are there any options in server for
I’ve seen setting gpu option to -1 it executes the translations in CPU.
intra_threads options are harcoded, both to 1. Changing these values to 4 respectively, does’t show the same behaviour as from the python API
As stated in Ctranslte2 repository:
For CPU translations, the parameter
inter_threadscontrols the number of batches a Translator instance can process in parallel. The
translate_filemethod automatically takes advantage of this parallelization.
However, extra work may be needed when using the
translate_batchmethod because multiple translations should be started concurrently from Python. If you are using a multithreaded HTTP server, this may already be the case. For other cases, you could use a ThreadPoolExecutor to submit multiple translations
Translate batch is used in the sever but it is not parallelized, therefore
inter_threads could only be 1
How did you verify it is not parallelized? Did you send parallel translation requests to the server?
Should parallel translation requests behave as inter_threads?
If I have 10000 sentences to transalte should I send them together to the server or should I send batch of, say 32, to the server?
You should send multiple batch requests in parallel. For example if you set
inter_threads to 4, you should send at least 4 translation requests in parallel to fully utilize the server capability.
Sorry, but… where do I set
inter_threads to 4 in the server?
This is hardcoded for now (https://github.com/OpenNMT/OpenNMT-py/blob/dfdd4554cf1c9327c2c249ab11a978efe72a93d9/onmt/translate/translation_server.py#L91), but you can PR the few changes to make this configurable if you want.