Hi everyone,
I am new in using OpenNMT-py. I faced this error during the training. Does anyone have a solution?
thanks
[2021-11-23 10:34:43,237 INFO] Step 1450/100000; acc: 31.48; ppl: 56.27; xent: 4.03; lr: 0.00018; 5181/5880 tok/s; 3156 sec
Traceback (most recent call last):
File "/opt/conda/bin/onmt_train", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/onmt/bin/train.py", line 172, in main
train(opt)
File "/opt/conda/lib/python3.7/site-packages/onmt/bin/train.py", line 157, in train
train_process(opt, device_id=0)
File "/opt/conda/lib/python3.7/site-packages/onmt/train_single.py", line 114, in main
valid_steps=opt.valid_steps)
File "/opt/conda/lib/python3.7/site-packages/onmt/trainer.py", line 225, in train
self._accum_batches(train_iter)):
File "/opt/conda/lib/python3.7/site-packages/onmt/trainer.py", line 163, in _accum_batches
for batch in iterator:
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/inputter.py", line 234, in __iter__
for batch in self.iterable:
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/dynamic_iterator.py", line 181, in __iter__
for bucket in self._bucketing():
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/dynamic_iterator.py", line 176, in _bucketing
yield from buckets
File "/opt/conda/lib/python3.7/site-packages/torchtext/data/iterator.py", line 261, in batch
for ex in data:
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/dynamic_iterator.py", line 77, in __iter__
item = next(iterator)
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/corpus.py", line 268, in __iter__
yield from indexed_corpus
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/corpus.py", line 246, in _add_index
for i, item in enumerate(stream):
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/corpus.py", line 230, in _transform
for example in stream:
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/corpus.py", line 218, in _tokenize
for example in stream:
File "/opt/conda/lib/python3.7/site-packages/onmt/inputters/corpus.py", line 149, in load
tline = tline.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 95: invalid continuation byte