<unk> tokens in every inference from optimized model

sprakash · April 1, 2020, 7:23am

Hi,

I have created a ctranslate2 optimized model and tried to get inferences from the converted model. I have used sentence piece model to encode the segments before passing it to model for inference.
I am getting “unk” token at the end of every segment in the output of ctranslate2.

Example: (I am showing mock data here)
Some_ text_ here_ < ukn >
Nex_t_ line_ < ukn >

Note: I am using default values for all parameters in model inference API. Using a docker image to generate inferences from windows machine.

guillaumekln · April 1, 2020, 7:28am

Hi,

Can you show an example on how the input looks like after tokenization?
What is the output that you get with the original training framework (OpenNMT-py or OpenNMT-tf)?