Loading train dataset from data/***.train.1.pt, number of examples: 22307
the train src and tgt sentences are written to src-train.txt and tgt-train.txt respectively.
Loading train dataset from data/***.train.1.pt, number of examples: 22307
the train src and tgt sentences are written to src-train.txt and tgt-train.txt respectively.
python3 preprocess.py -train_src data/convai2_new/src-train.txt -train_tgt data/convai2_new/tgt-train.txt -valid_src data/convai2_new/src-val.txt -valid_tgt data/convai2_new/tgt-val.txt -save_data data/convai2_new
did you play with src_seq_length
and tgt_seq_length
options? These options filter out too long sentences. Just try with values larger than then default value (50).
Thank you,you are right