Having issue while preprocessing the data


(Jaffer Wilson) #1

These are the errors that I came across while I was trying to preprocess the datasets:

python preprocess.py -train_src data/train/article.txt -train_tgt data/train/summary.txt -valid_src data/train/article.txt -valid_tgt data/train/summary.txt -save_data data/train/t
Building source vocabulary...
Created dictionary of size 4227 (pruned from 4227)
Building target vocabulary...
Created dictionary of size 675 (pruned from 675)
Preparing training ...
Processing data/train/article.txt & data/train/summary.txt ...
... shuffling sentences
Traceback (most recent call last):
  File "preprocess.py", line 237, in <module>
  File "preprocess.py", line 216, in main
    dicts['src'], dicts['tgt'])
  File "preprocess.py", line 184, in makeData
    perm = torch.randperm(len(src))
RuntimeError: must be strictly positive at /home/ubuntu/pytorch/torch/lib/TH/generic/THTensorMath.c:1984

Kindly, let me know the solution for this. Thank you in advanced.

Need a proper understanding of the preprocess step