OpenNMT Forum

How do I use tokenizer.perl?

Hello guys! Thank you for all the help so far. I’ve really trying to get the tokenizer work. But to no avail. Am I typing it in wrong? If so, what is going on?
This is my command line:

(base) PS>perl .\tools\tokenizer.perl -l zh -threads 4 tools\tgt-train.txt tools\output_en.tok.txt
Tokenizer Version 1.1
Language: zh
Number of threads: 4

It really just kind of ends there and does not tokenize my file at all

Just specify the input output.
Also I suggest you to give the option

-no-escape

Please enter the following.

perl ./tools/tokenizer.perl -l zh -threads 4 -no-escape < tgt-train.txt > output_en.tok.txt

1 Like

像这样用

./tools/tokenizer.perl -l en < /home/OpenNMT-py/zrinput.txt5 > /home/OpenNMT-py/zrinput.txt51

1 Like