I try to train OpenNMT-py for Myanmar-English translation. I already found (create) en and my source files to create the model. The problem is, in my file the sentences are not segmented into words. Because some languages (Thai, etc…), it won’t necessary to have ‘space’ to separate words. So, the accuracy is always below 30%.
I tried OpenNMT-torch preprocess with tokenizer options. But that argument is not available in OpenNMT-py.
Anyone came across a way to segment Myanmar sentence into words? I could find only a syllable segmentation, not word segmentation. Or any workaround for this?
Thank you in advance.