Great work @guillaumekln to bring something like CTranslate2 for model inference optimization.

I have been using openNMT for training MT models, came across CTranslate2 recently and gave a try to get inference on CPU.

As a first step, I was successful to convert my model to int16 quantization ctranslate2 model based on discussion in forum.

And, I am not able to use the converted model for inference.

We can get inference using docker image as seen on readme file:

**docker run --rm opennmt/ctranslate2:latest-ubuntu18-gpu --model /data/ned2eng123_ctranslate2/**

I am getting an error like below:

Please let me know if I am doing something wrong here. Thanks in advance.