Non-source text should be kept as it is in the Target

KishorKP · September 3, 2019, 10:30am

Hi,

I am presently using the in-built opennmt model fro eng-ger translation. But when there are
non-english character like japanese, or german,the output gets gobbled. How could I enable a option that keeps non-english characters as it is in translated text.

Please find the below output :
INPUT ==> {‘text’: ‘00003 AUTHOR. (05.04.07) ｽｽﾞｷｶｽﾞﾋｺ. LV049\n’},

OUTPUT ==> {‘Text’: '00003 AUTHOR (05.04.07) Streife Herr Gescheigene Geschößerscheinhöfe fallen zufällig.

Please let me know if there is any option which could be used to achieve this ?

Thank You,
Kishor

guillaumekln · September 10, 2019, 10:19am

Hi,

There is no such option in OpenNMT-py. This should probably be handled in a preprocessing step.

KishorKP · September 11, 2019, 11:49am

Hi,

How do we handle this in preprocessing step?

Please provide me the link for this if there are any.

Thank You,
Kishor.

guillaumekln · September 11, 2019, 12:34pm

There is no resource in OpenNMT on this subject.

Maybe in your application, verify the language of the input and if it does not match the model, do not send the request?