-segment_numbers option?

jiny · January 30, 2018, 5:07am

Hello!

I want to ask about -segment_numbers option.

If i put this option when i tokenize, can i check it in my output file?

This is my command,

th tools/tokenize.lua -case_feature true -segment_case true -segment_numbers true -joiner_annotate true  < input_test_en.txt > test.tok

and the output is like below.

the￨C convention￨L in￨L 1912￨N led￨L to￨L a￨L split￨L republican￨C party￨C ￭.￨N

I expected 1912 segmented like 1 9 1 2 but there is no change…

Please help me.
Thank you.

guillaumekln · January 30, 2018, 9:38am

Hello,

There seems to be an undocumented option dependency: you should also set the option -mode aggressive.

jean.senellart · January 30, 2018, 12:53pm

@jiny, please file an issue on GitHub - the option should not be depending on aggressive. Thanks