Sentencepiece decoder working inconsistently

(’’.join(spm.decode("_","")) is applied in the decoding step. But some tokens still are getting whitespaces in between

T h i s is a query and the final output should be

This is a query.

Here This is getting split badly

For some test sentences it is working perfect and some words it is still carring spaces


I’m sure the decoding function is working great. It most be it’s either your data that has double spaces or something in your code…

Can you share the video from your coding to decoding step?

Best regards,

Verifying the data format and I’ll get back

I agree with @SamuelLacombe. Such issues are usually down to the data. I’ve been using SentencePierce for several years with a variety of alphabets and have never encountered unwanted white spaces.