I am trying to have my own dataset for the training purpose. Especially for the summarization. As I know there are three steps of OpenNMT: Preprocess, train and translate.
I have question regarding the preProcess step.
I have gathered article and there summaries.
I am trying to run the preprocess step as:
python preprocess.py -train_src article.txt -train_tgt summary -valid_src ../data/train/valid.article.filter.txt -valid_tgt ../data/train/valid.title.filter.txt -save_data ../data/train/textsum
But I do not understand what I need to give in the field of:
-valid_src. Do I need to change that already used sources? If not then what could be the replacements. I have used an article from the NYT for testing purpose.
Kindly help me.