Hi,
I’ve got the following error when trying to preprocess my data for training:
[COMMAND] onmt_preprocess -train_src train/corpus.train.src.tok -train_tgt train/corpus.train.tgt.tok -valid_src dev/corpus.dev.src.tok -valid_tgt dev/corpus.dev.tgt.tok -save_data preprocess/preprocessed --num_threads 2 --src_seq_length 150 --tgt_seq_length 150
[2020-09-04 15:56:51,394 INFO] Extracting features...
[2020-09-04 15:56:51,394 INFO] * number of source features: 0.
[2020-09-04 15:56:51,394 INFO] * number of target features: 0.
[2020-09-04 15:56:51,394 INFO] Building `Fields` object...
[2020-09-04 15:56:51,394 INFO] Building & saving training data...
Traceback (most recent call last):
File "run/venv/bin/onmt_preprocess", line 11, in <module>
load_entry_point('OpenNMT-py==1.1.1', 'console_scripts', 'onmt_preprocess')()
File "run/venv/lib/python3.6/site-packages/OpenNMT_py-1.1.1-py3.6.egg/onmt/bin/preprocess.py", line 318, in main
preprocess(opt)
File "run/venv/lib/python3.6/site-packages/OpenNMT_py-1.1.1-py3.6.egg/onmt/bin/preprocess.py", line 298, in preprocess
'train', fields, src_reader, tgt_reader, align_reader, opt)
File "run/venv/lib/python3.6/site-packages/OpenNMT_py-1.1.1-py3.6.egg/onmt/bin/preprocess.py", line 205, in build_save_dataset
for sub_counter in p.imap(func, shard_iter):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
**struct.error: 'i' format requires -2147483648 <= number <= 2147483647**
Traceback (most recent call last):
File "../run/run.py", line 293, in <module>
raise Exception("There was an error preprocessing data")
Exception: There was an error preprocessing dataç
Any clue of what’s happening here?