Translate unknown tokens

When I translate text like this "<unk>a<unk><unk>" result of translation is "<unk> a <unk> <unk>". I think tokenizer add some of spaces, mb anyone know best practices for remove addition spaces from translation?

thx. for attention

Do you have an example for that?

Detailed check show as spacers added by translator models. And as i saw correctly, tag <unk> was removed from model dictionary.

mb you can help with adding my own tag like a <mytag> as will be transparent translated to destination language without any transformation, like <mytag>SrcLang<mytag><mytag> to <mytag>DstLang<mytag><mytag>. mb you can give some advice or direct me to some whitepapers about it.

thx.

There are already some discussions about this on the forum. See for example:

1 Like