Assertion Error, data not preprocessing correctly

Soap89 · June 6, 2019, 4:22am

Hello! For some reason I can’t seem to preporcess my data. Even though I have aligned my data correctly and saved it as UTF-8 for my src-train, src-val, tgt-train and tgt-val files. I keep getting this error.
[2019-06-06 12:17:46,525 INFO] Extracting features…
[2019-06-06 12:17:46,526 INFO] * number of source features: 0.
[2019-06-06 12:17:46,526 INFO] * number of target features: 0.
[2019-06-06 12:17:46,527 INFO] Building Fields object…
[2019-06-06 12:17:46,527 INFO] Building & saving training data…
[2019-06-06 12:17:46,528 INFO] Reading source and target files: data/src-train.txt data/tgt-train.txt.
Traceback (most recent call last):
File “preprocess.py”, line 217, in
main(opt)
File “preprocess.py”, line 198, in main
‘train’, fields, src_reader, tgt_reader, opt)
File “preprocess.py”, line 83, in build_save_dataset
assert len(src_shard) == len(tgt_shard)
AssertionError
Are my datasets in incorrect format? if so how should i change them?

vince62s · June 6, 2019, 6:20am

HongChow · November 9, 2019, 3:22am

Have you fixed this problem?

HongChow · November 11, 2019, 2:37am

something wrong with the dataset,