OpenNMT Forum

Opennmt-tf tokenizer using case_feature issue

This is my command and config file.

$ onmt-build-vocab --tokenizer_config ./agg.yml --size 50000 --save_vocab  test.tok test.txt
  • agg.yml
mode: aggressive
joiner_annotate: true
segment_numbers: true
segment_alphabet_change: true
case_feature: true
  • test.txt
WiFi
korea
KOREA
  • test.tok
<blank>
<s>
</s>
korea
wifi

I think both word korea and KOREA should be in test.tok because case_feature: true
but it is not.
I wonder where is case_feature infomation?

When case_feature is used in OpenNMT-tf, the tokens are lowercased but the case feature is ignored. So the vocabulary file is expected.