Great work @guillaumekln to bring something like CTranslate2 for model inference optimization.
I have been using openNMT for training MT models, came across CTranslate2 recently and gave a try to get inference on CPU.
As a first step, I was successful to convert my model to int16 quantization ctranslate2 model based on discussion in forum.
And, I am not able to use the converted model for inference.
We can get inference using docker image as seen on readme file:
docker run --rm opennmt/ctranslate2:latest-ubuntu18-gpu --model /data/ned2eng123_ctranslate2/
I am getting an error like below:
Please let me know if I am doing something wrong here. Thanks in advance.