I am trying to use an ONMT seq2seq model (pre-trained LSTM ) to inference in a celery based architecture. I am thinking of using the onmt package with command line onmt translate calls as the backend would be dockerized ( the onmt model won’t be a service of its own but a part of another much larger service). The inference would be on a CPU machine (for now at least).
I wanted to know if there is a functionality to multiprocess the inference on the available cpu cores ignorer to enhance performance?
Also, does using onmt command line sound like a good choice?