Japanese training

(liluhao1982) #21

If so, it is better to add an option disable tokenization when using the rest server instead of removing automatic tokenization/detokenization as this is great feature and in most time automatic tokenization/detokenization can work well except some language which need special morphological analyzer, e.g: Japanese, Chinese…, I required automatic tokenization/detokenization for rest server previously.

(Tnkmsh) #22

Thanks for the good information. I’m also trying translation including Asian languages and I’d like to apply this tokenizer to ONMT, but I have no idea how to change options. Do I need to change the source code?