Translating names issue

Hi everyone,

The issue is simple: when I finish the training of the data I’m able to translate simple or complex sentences, which include words that appear in the dataset. Ok, but now… what could I do if I’m trying to translate a sentence with a name, or a name of a country, or some token that not appears in the dataset?. I mean, is there any way to keep an unknown word in the final translation?.

For example: If I wanted to translate this: “Mike Wazowski is an alien with an eye in his face”

And “Mike Wazowski” is not in my dataset, what can I do to keep “Mike Wazowski” in the final translation? even if it’s an unknown word.

I hope you can help me, thank you very much,

Sincerely,
John.

Look for subwords / BPE tokenization. This will allow you to cover a larger vocabulary and will help handle such cases.