Async translation for ctranslate2 model

nehasoni3 · July 12, 2021, 5:56am

Hey all,
One question, GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models

This says async translation is supported. So that needs OpenMP i.e. OPENMP_RUNTIME while compilation?GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models

Also, How can I do async translation? Is any example available?

nehasoni3 · July 12, 2021, 6:38am

github.com

OpenNMT/CTranslate2/blob/master/docs/python.md#note-on-parallel-translations

# Python

```python
import ctranslate2
```

## Model conversion API

```python
converter = ctranslate2.converters.OpenNMTTFConverter(
    model_spec: ModelSpec,   # Specification of the model to convert.
    src_vocab: str,          # Path to the source vocabulary.
    tgt_vocab: str,          # Path to the target vocabulary.
    model_path: str = None,  # Path to a OpenNMT-tf checkpoint (mutually exclusive with variables)
    variables: dict = None,  # Dict of variables name to value (mutually exclusive with model_path).
)

converter = ctranslate2.converters.OpenNMTPyConverter(
    model_path: str,         # Path to the OpenNMT-py model.
)

This file has been truncated. show original

Do I need OPENMP FOR THIS?

guillaumekln · July 12, 2021, 7:27am

As said in the first link you posted, OpenMP is used for intra_threads which configures the level of parallelism within each translation. If you enable parallel translations by increasing inter_threads, then OpenMP is not required.

Asynchronous translations can be implemented in C++ with the TranslatorPool class. Here intra_threads corresponds to num_threads_per_translator and inter_threads corresponds to num_translators.

github.com

OpenNMT/CTranslate2/blob/v2.2.0/include/ctranslate2/translator_pool.h#L23

    
      
          namespace ctranslate2 {
          
          
  struct TranslationStats {
              size_t num_tokens = 0;
              size_t num_examples = 0;
              double total_time_in_ms = 0;
            };
          
          
  class BufferedTranslationWrapper;
          
          
  // A pool of Translators running in parallel.
            class TranslatorPool {
            public:
              TranslatorPool(size_t num_translators,
                             size_t num_threads_per_translator,
                             const std::string& model_dir,
                             const Device device = Device::CPU,
                             const int device_index = 0,
                             const ComputeType compute_type = ComputeType::DEFAULT);
          
          
    // Multi-device constructor.

The class usage should be pretty straightforward (see the method translate_batch_async).

nehasoni3 · July 12, 2021, 7:35am

So, that second link python API with intra_thread and tnter_thread will not help me much?
Also, Where do I set the value of inter and intra thread in this translate_batch_async definiation?

guillaumekln · July 12, 2021, 7:36am

I updated my message:

nehasoni3 · July 12, 2021, 7:39am

OK.

github.com

OpenNMT/CTranslate2/blob/v2.2.0/examples/wngt2020/main.cc

#include <sentencepiece_processor.h>
#include <ctranslate2/translator_pool.h>
#include <ctranslate2/models/sequence_to_sequence.h>
#include <regex>

static std::vector<std::string> get_vocabulary_tokens(const ctranslate2::Vocabulary& vocabulary) {
  std::vector<std::string> tokens;
  const size_t size = vocabulary.size();
  tokens.reserve(size);
  for (size_t i = 0; i < size; ++i)
    tokens.emplace_back(vocabulary.to_token(i));
  return tokens;
}

int main(int, char* argv[]) {
  const std::string in_file = argv[1];
  const std::string out_file = argv[2];
  const int num_cores = std::stoi(std::string(argv[3]));

  const std::string model_path = "/model";

This file has been truncated. show original

This is c++ inference code snippet. Can I follow this.

guillaumekln · July 12, 2021, 8:04am

You could, but this example is not using the asynchronous API. There is no official example for the asynchronous API. You should read the TranslatorPool class interface directly.

guillaumekln · September 10, 2021, 2:20pm

For reference, version 2.4.0 supports asynchronous translation from Python. The usage is very easy:

async_results = translator.translate_batch(batch, asynchronous=True)

for async_result in async_results:
    print(async_result.result())  # blocks until the result is available.