OpenNMT Forum

"List out of range" while preprocessing the data

I am trying the replicate French to English Transformer model results on the WMT-14 dataset.
Command I used:

onmt_preprocess -train_src data/BPE/train.src -train_tgt data/BPE/train.tgt -valid_src data/BPE/val.src -valid_tgt data/BPE/val.tgt -save_data data/BPE/en_fr -src_vocab_size 100000 -tgt_vocab_size 100000

Traceback (most recent call last):
File “/home/translateme/.local/bin/onmt_preprocess”, line 11, in
sys.exit(main())
File “/home/translateme/.local/lib/python3.6/site-packages/onmt/bin/preprocess.py”, line 298, in main
preprocess(opt)
File “/home/translateme/.local/lib/python3.6/site-packages/onmt/bin/preprocess.py”, line 256, in preprocess
src_nfeats += count_features(src) if opt.data_type == ‘text’
File “/home/translateme/.local/lib/python3.6/site-packages/onmt/bin/preprocess.py”, line 241, in count_features
first_tok = f.readline().split(None, 1)[0]
IndexError: list index out of range

Before this, I have applied BPE on raw data and feeding those files only.

Any suggestions for this error??

Many Thanks in advance.

This seems strange. Maybe one of the data files is empty?

@francoishernandez I checked data is fine.