What is the importance of validation files? Can you train without them?

mzeid · April 9, 2017, 6:54pm

Hello everybody,
I have just installed OpenNMT and I see under "data’ files, some validation files? What is the purpose of these validation files? Does OpenNMT create them automatically or do I have to create them along with the the parallel text files prior to training?

Thanks,
mzeid

jean.senellart · April 9, 2017, 7:16pm

Hello,

See the following topic talking about the role of validation data:

Although you could theoretically train without validation data, we blocked this feature in OpenNMT, and we don’t create them automatically so you just need to take 500-1000 sentences out of your bitext (and exclude these sentences from your training corpus), or better take some parallel data exactly corresponding to your use-case.

mzeid · April 10, 2017, 4:15am

Thanks Jean for your reply! I read through the other thread, but I still don’t get the idea of using a subset of the to-train data for validation? I assume that the train would not start if there is no validation files, right?