Async translation for ctranslate2 model

Hey all,
One question, GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models

This says async translation is supported. So that needs OpenMP i.e. OPENMP_RUNTIME while compilation?GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models

Also, How can I do async translation? Is any example available?


As said in the first link you posted, OpenMP is used for intra_threads which configures the level of parallelism within each translation. If you enable parallel translations by increasing inter_threads, then OpenMP is not required.

Asynchronous translations can be implemented in C++ with the TranslatorPool class. Here intra_threads corresponds to num_threads_per_translator and inter_threads corresponds to num_translators.

The class usage should be pretty straightforward (see the method translate_batch_async).

So, that second link python API with intra_thread and tnter_thread will not help me much?
Also, Where do I set the value of inter and intra thread in this translate_batch_async definiation?

I updated my message:


This is c++ inference code snippet. Can I follow this.

You could, but this example is not using the asynchronous API. There is no official example for the asynchronous API. You should read the TranslatorPool class interface directly.