OpenNMT Forum

What does do with our dataset


(Quang Le) #1

hi everyone, I’m kinda new with OpenNMT-py toolkit. I’m following quickstart steps on OpenNMT-py documentation pages. When i used the, it’s generate 3 files which are *, * and * I still don’t know what happened to my dataset when i run the Are there any explanations about it ? Please share with me. Thanks for your support :slight_smile:

(Guillaume Klein) #2


The preprocessing does not do much. It computes the vocabularies given the most frequent tokens, filters too long sentences, and assigns an index to each token.

Look at the code for more details.