Great work @guillaumekln to bring something like CTranslate2 for model inference optimization.
I have been using openNMT for training MT models, came across CTranslate2 recently and gave a try to get inference on CPU.
As a first step, I was successful to convert my model to int16 quantization ctranslate2 model based on discussion in forum.
And, I am not able to use the converted model for inference.
We can get inference using docker image as seen on readme file:
docker run --rm opennmt/ctranslate2:latest-ubuntu18-gpu --model /data/ned2eng123_ctranslate2/
I am getting an error like below:
Please let me know if I am doing something wrong here. Thanks in advance.
You should mount the directory containing
ned2eng123_ctranslate2 in the Docker container. For example, the following option:
docker run --rm -v $PWD:/data [...]
will make the current directory available to the Docker container under
Thanks @guillaumekln for quick response. I am using a windows machine,
with small changes to code, this is what I am trying:
> docker run --rm --volume “//c/Users/sprakash//Documents/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/
It is executed without any error now. When trying to pass other arguments of --src and --tgt, I am getting errors of invalid path.
docker run --rm --volume “//c/Users/sprakash/Documents/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/ --src “/testfile.txt” --tgt “/output.txt”
what(): Unable to open input file testfile.txt
Docker containers can’t access files from the host unless there are mounted with the
The easiest is that you copy
//c/Users/sprakash/Documents/ned2eng123_ctranslate2 and reference it as
/data/testfile.txt on the command line.
Yes.Its working now.Thanks a lot.
But, seems output file gives strange results like “_2” for “Hello world” in testfile.txt (input). I am using pre-trained English - German model from openNMT for inference.
The model expects tokenized inputs. See the examples: https://github.com/OpenNMT/CTranslate2#translating
Sorry for delayed response. I did try with sentencepiece tokenized input text and mounted it along with converted model using docker image. As a response, I am getting all UKN tokens for each corresponding token in source file.
File snapshot is shown below:
Can you post the command lines to reproduce this output?
Model file and check.txt are placed in ned2eng123_ctranslate2 path.
check.txt contains tokenized input text.
docker run --rm --volume “//c/Users/sprakash//Documents/github/python/ned2eng123_ctranslate2:/data” opennmt/ctranslate2:latest-ubuntu18 --model /data/ --src “/data/check.txt” --tgt “/data/output1.txt”
Can you post the content of
The issue could come from:
- The input is not correctly tokenized
- The model is actually not compatible with CTranslate2 despite the successful conversion
- You passed incorrect word vocabularies to the conversion script
- Windows is causing issues.