Model inference using docker image

sprakash · February 20, 2020, 9:38am

Great work @guillaumekln to bring something like CTranslate2 for model inference optimization.
I have been using openNMT for training MT models, came across CTranslate2 recently and gave a try to get inference on CPU.
As a first step, I was successful to convert my model to int16 quantization ctranslate2 model based on discussion in forum.
And, I am not able to use the converted model for inference.
We can get inference using docker image as seen on readme file:

docker run --rm opennmt/ctranslate2:latest-ubuntu18-gpu --model /data/ned2eng123_ctranslate2/

I am getting an error like below:

Please let me know if I am doing something wrong here. Thanks in advance.

guillaumekln · February 20, 2020, 9:46am

You should mount the directory containing ned2eng123_ctranslate2 in the Docker container. For example, the following option:

docker run --rm -v $PWD:/data [...]

will make the current directory available to the Docker container under /data.

sprakash · February 20, 2020, 10:47am

Thanks @guillaumekln for quick response. I am using a windows machine,
with small changes to code, this is what I am trying:

> docker run --rm --volume “//c/Users/sprakash//Documents/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/

It is executed without any error now. When trying to pass other arguments of --src and --tgt, I am getting errors of invalid path.

`**

docker run --rm --volume “//c/Users/sprakash/Documents/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/ --src “/testfile.txt” --tgt “/output.txt”

**`
Error:
what(): Unable to open input file testfile.txt

guillaumekln · February 20, 2020, 10:52am

Docker containers can’t access files from the host unless there are mounted with the --volume option.

The easiest is that you copy testfile.txt in //c/Users/sprakash/Documents/ned2eng123_ctranslate2 and reference it as /data/testfile.txt on the command line.

sprakash · February 20, 2020, 11:18am

Yes.Its working now.Thanks a lot.
But, seems output file gives strange results like “_2” for “Hello world” in testfile.txt (input). I am using pre-trained English - German model from openNMT for inference.

guillaumekln · February 20, 2020, 12:28pm

The model expects tokenized inputs. See the examples: https://github.com/OpenNMT/CTranslate2#translating

sprakash · February 24, 2020, 9:48am

Sorry for delayed response. I did try with sentencepiece tokenized input text and mounted it along with converted model using docker image. As a response, I am getting all UKN tokens for each corresponding token in source file.

File snapshot is shown below:

guillaumekln · February 24, 2020, 10:17am

Can you post the command lines to reproduce this output?

sprakash · February 24, 2020, 10:53am

Sure.
Model file and check.txt are placed in ned2eng123_ctranslate2 path.
check.txt contains tokenized input text.
`

docker run --rm --volume “//c/Users/sprakash//Documents/github/python/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/ --src “/data/check.txt” --tgt “/data/output1.txt”

`

guillaumekln · February 24, 2020, 5:30pm

Can you post the content of check.txt?

The issue could come from:

The input is not correctly tokenized
The model is actually not compatible with CTranslate2 despite the successful conversion
You passed incorrect word vocabularies to the conversion script
Windows is causing issues.