Question of out of vocabulary words in OpenNMT-py

pytorch

(Gmramaswamy) #1

hello,
i am justing getting started using opennmt-py for translation. pardon me if i have missed something basic here.
the issue i am facing is regarding handling of OOV words. My requirement is that while doing translation, OOV needs to be given back exactly without translation as these are proper nouns. but i am unable to achieve that. Translation is giving some error word as translation for OOV word.

From what i could see in past issues, one recommendation was to preprocess the data with 2 additional parameters -dynamic_dict -share_vocab and while running translation run with -replace_unk. this didnt solve the problem.
I also tried the mechanism of doing pre-processing with the above metioned parameters and also giving -copy_attn in model training. In one past ticket it was mentioned that it is not needed but tried this also. Didnt work.

Can someone please help in terms of what i need to do to ensure that OOV are just given back without translation?

thanks,
ganesh


(Guillaume Klein) #2

This was answered on GitHub: