-segment_numbers option?



I want to ask about -segment_numbers option.

If i put this option when i tokenize, can i check it in my output file?

This is my command,

th tools/tokenize.lua -case_feature true -segment_case true -segment_numbers true -joiner_annotate true  < input_test_en.txt > test.tok

and the output is like below.

the│C convention│L in│L 1912│N led│L to│L a│L split│L republican│C party│C ■.│N

I expected 1912 segmented like 1 9 1 2 but there is no change…

Please help me.
Thank you.

(Guillaume Klein) #2


There seems to be an undocumented option dependency: you should also set the option -mode aggressive.

(jean.senellart) #3

@jiny, please file an issue on GitHub - the option should not be depending on aggressive. Thanks