tester
(tester)
June 12, 2022, 1:13pm
1
(’’.join(spm.decode("_","")) is applied in the decoding step. But some tokens still are getting whitespaces in between
T h i s is a query and the final output should be
This is a query.
Here This is getting split badly
For some test sentences it is working perfect and some words it is still carring spaces
Hello,
I’m sure the decoding function is working great. It most be it’s either your data that has double spaces or something in your code…
Can you share the video from your coding to decoding step?
Best regards,
Samuel
tester
(tester)
June 14, 2022, 3:13am
3
Verifying the data format and I’ll get back
tel34
(Terence Lewis)
June 14, 2022, 9:50am
4
I agree with @SamuelLacombe . Such issues are usually down to the data. I’ve been using SentencePierce for several years with a variety of alphabets and have never encountered unwanted white spaces.