The baseline set for EN/ZH contains files with loctok.zh and .zh extensions.
Have the chinese sets been tokenized with another tokenizer or also with tokenize.lua ?
if not which one was used ?
thanks
NB: same question for Korean and Japanese.
The baseline set for EN/ZH contains files with loctok.zh and .zh extensions.
Have the chinese sets been tokenized with another tokenizer or also with tokenize.lua ?
if not which one was used ?
thanks
NB: same question for Korean and Japanese.