Model inference using docker image

Great work @guillaumekln to bring something like CTranslate2 for model inference optimization.
I have been using openNMT for training MT models, came across CTranslate2 recently and gave a try to get inference on CPU.
As a first step, I was successful to convert my model to int16 quantization ctranslate2 model based on discussion in forum.
And, I am not able to use the converted model for inference.
We can get inference using docker image as seen on readme file:

docker run --rm opennmt/ctranslate2:latest-ubuntu18-gpu --model /data/ned2eng123_ctranslate2/

I am getting an error like below:

Please let me know if I am doing something wrong here. Thanks in advance.

You should mount the directory containing ned2eng123_ctranslate2 in the Docker container. For example, the following option:

docker run --rm -v $PWD:/data [...]

will make the current directory available to the Docker container under /data.

Thanks @guillaumekln for quick response. I am using a windows machine,
with small changes to code, this is what I am trying:

> docker run --rm --volume “//c/Users/sprakash//Documents/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/

It is executed without any error now. When trying to pass other arguments of --src and --tgt, I am getting errors of invalid path.

`**

docker run --rm --volume “//c/Users/sprakash/Documents/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/ --src “/testfile.txt” --tgt “/output.txt”

**`
Error:
what(): Unable to open input file testfile.txt

Docker containers can’t access files from the host unless there are mounted with the --volume option.

The easiest is that you copy testfile.txt in //c/Users/sprakash/Documents/ned2eng123_ctranslate2 and reference it as /data/testfile.txt on the command line.

Yes.Its working now.Thanks a lot.
But, seems output file gives strange results like “_2” for “Hello world” in testfile.txt (input). I am using pre-trained English - German model from openNMT for inference.

The model expects tokenized inputs. See the examples: https://github.com/OpenNMT/CTranslate2#translating

Sorry for delayed response. I did try with sentencepiece tokenized input text and mounted it along with converted model using docker image. As a response, I am getting all UKN tokens for each corresponding token in source file.

File snapshot is shown below:

Can you post the command lines to reproduce this output?

Sure.
Model file and check.txt are placed in ned2eng123_ctranslate2 path.
check.txt contains tokenized input text.
`

docker run --rm --volume “//c/Users/sprakash//Documents/github/python/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/ --src “/data/check.txt” --tgt “/data/output1.txt”

`

Can you post the content of check.txt?

The issue could come from:

  • The input is not correctly tokenized
  • The model is actually not compatible with CTranslate2 despite the successful conversion
  • You passed incorrect word vocabularies to the conversion script
  • Windows is causing issues.