OpenNMT Forum

Unknown word Untranslated


I am developing Punjabi to English Model.
for translation, I am using the command with the -replace_unk option.

onmt_translate -model march2020/nmt(rnn)/" -src march2020/nmt(rnn)/test.pun -output march2020/nmt(rnn)/test_pred_rnn_17_epoch.eng.txt -replace_unk -verbose

My source train file does not contain the word “ਸੁਲਖਣੀ” in any sentence. So I want it to come as such in the testing.

But it is giving incorrect output

Input: ਉਸਦੀ ਪਤਨੀ ਦਾ ਨਾਮ ਸੁਲਖਣੀ ਦੇਵੀ ਹੈ।
Output by NMT: the name of his wife is 281.50 devi .
Output by SMT: the name of his wife ਸੁਲਖਣੀ devi .

NMT had translated it “ਸੁਲਖਣੀ” to “281.50”.

How to resolve it. Need help

Please resolve my query.


What tokenization method are you applying to the input? Maybe you want to look for subword segmentation techniques like SentencePiece.