<unk> tokens in every inference from optimized model


I have created a ctranslate2 optimized model and tried to get inferences from the converted model. I have used sentence piece model to encode the segments before passing it to model for inference.
I am getting “unk” token at the end of every segment in the output of ctranslate2.

Example: (I am showing mock data here)
Some_ text_ here_ < ukn >
Nex_t_ line_ < ukn >

Note: I am using default values for all parameters in model inference API. Using a docker image to generate inferences from windows machine.


  1. Can you show an example on how the input looks like after tokenization?
  2. What is the output that you get with the original training framework (OpenNMT-py or OpenNMT-tf)?