Hello guys! Thank you for all the help so far. I’ve really trying to get the tokenizer work. But to no avail. Am I typing it in wrong? If so, what is going on?
This is my command line:
(base) PS>perl .\tools\tokenizer.perl -l zh -threads 4 tools\tgt-train.txt tools\output_en.tok.txt
Tokenizer Version 1.1
Language: zh
Number of threads: 4
It really just kind of ends there and does not tokenize my file at all
How tokenization is done for all the trainind data, test data, validation data please help, i’m using opennmt-py. i would like to do tokenization on parallel data i.e English-Mizo language. Your help will be really appreciated.