Hello ! I am try to train my own dataset, but after preprocessioning I can’t seem to get a ‘data/demo.vocab.pt’ from the dataset. I am also getting an assertion error. What am I doing wrong?
[2019-06-04 18:56:09,553 INFO] Extracting features...
[2019-06-04 18:56:09,597 INFO] * number of source features: 0.
[2019-06-04 18:56:09,597 INFO] * number of target features: 0.
[2019-06-04 18:56:09,597 INFO] Building `Fields` object...
[2019-06-04 18:56:09,598 INFO] Building & saving training data...
[2019-06-04 18:56:09,598 INFO] Reading source and target files: data/src-train.txt data/tgt-train.txt.
[2019-06-04 18:56:11,381 INFO] Building shard 0.
[2019-06-04 18:56:37,781 INFO] * saving 0th train data shard to data/demo.train.0.pt.
[2019-06-04 18:57:06,543 INFO] Building shard 1.
[2019-06-04 18:57:34,903 INFO] * saving 1th train data shard to data/demo.train.1.pt.
[2019-06-04 18:58:04,455 INFO] Building shard 2.
[2019-06-04 18:58:32,880 INFO] * saving 2th train data shard to data/demo.train.2.pt.
[2019-06-04 18:59:02,682 INFO] Building shard 3.
[2019-06-04 18:59:32,552 INFO] * saving 3th train data shard to data/demo.train.3.pt.
[2019-06-04 19:00:02,571 INFO] Building shard 4.
[2019-06-04 19:00:32,264 INFO] * saving 4th train data shard to data/demo.train.4.pt.
[2019-06-04 19:01:01,782 INFO] Building shard 5.
[2019-06-04 19:01:31,756 INFO] * saving 5th train data shard to data/demo.train.5.pt.
[2019-06-04 19:02:01,825 INFO] Building shard 6.
[2019-06-04 19:02:33,348 INFO] * saving 6th train data shard to data/demo.train.6.pt.
[2019-06-04 19:03:02,821 INFO] Building shard 7.
[2019-06-04 19:03:38,801 INFO] * saving 7th train data shard to data/demo.train.7.pt.
[2019-06-04 19:04:08,545 INFO] Building shard 8.
[2019-06-04 19:04:39,184 INFO] * saving 8th train data shard to data/demo.train.8.pt.
[2019-06-04 19:05:08,535 INFO] Building shard 9.
[2019-06-04 19:05:40,973 INFO] * saving 9th train data shard to data/demo.train.9.pt.
Traceback (most recent call last):
File "preprocess.py", line 217, in <module>
main(opt)
File "preprocess.py", line 198, in main
'train', fields, src_reader, tgt_reader, opt)
File "preprocess.py", line 83, in build_save_dataset
assert len(src_shard) == len(tgt_shard)
AssertionError