ONMT multiprocess inference on CPU in production

I am trying to use an ONMT seq2seq model (pre-trained LSTM ) to inference in a celery based architecture. I am thinking of using the onmt package with command line onmt translate calls as the backend would be dockerized ( the onmt model won’t be a service of its own but a part of another much larger service). The inference would be on a CPU machine (for now at least).

I wanted to know if there is a functionality to multiprocess the inference on the available cpu cores ignorer to enhance performance?

Also, does using onmt command line sound like a good choice?

Do you mean using multiple cores for one translation, or run multiple translations in parallel?

See:

@guillaumekln I mean running multiple translations as separate processes on the CPU. For my use case I need to run multiple instances of onmt inference. If I use python’s multiprocessing to spawn multiple processing threads of my onmt caller, I end up in a deadlock where none on the processes get completed.

Also, is there a method to assign a specific inference to a specific CPU core?

I would typically recommend training a Transformer model and running inference with CTranslate2, which can efficiently run translations in parallel on CPU.