Async translation for ctranslate2 model

Hey all,
One question, GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models

This says async translation is supported. So that needs OpenMP i.e. OPENMP_RUNTIME while compilation?GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models

Also, How can I do async translation? Is any example available?


As said in the first link you posted, OpenMP is used for intra_threads which configures the level of parallelism within each translation. If you enable parallel translations by increasing inter_threads, then OpenMP is not required.

Asynchronous translations can be implemented in C++ with the TranslatorPool class. Here intra_threads corresponds to num_threads_per_translator and inter_threads corresponds to num_translators.

The class usage should be pretty straightforward (see the method translate_batch_async).

So, that second link python API with intra_thread and tnter_thread will not help me much?
Also, Where do I set the value of inter and intra thread in this translate_batch_async definiation?

I updated my message:


This is c++ inference code snippet. Can I follow this.

You could, but this example is not using the asynchronous API. There is no official example for the asynchronous API. You should read the TranslatorPool class interface directly.

For reference, version 2.4.0 supports asynchronous translation from Python. The usage is very easy:

async_results = translator.translate_batch(batch, asynchronous=True)

for async_result in async_results:
    print(async_result.result())  # blocks until the result is available.