What is the importance of validation files? Can you train without them?


(Mohamed Zeid) #1

Hello everybody,
I have just installed OpenNMT and I see under "data’ files, some validation files? What is the purpose of these validation files? Does OpenNMT create them automatically or do I have to create them along with the the parallel text files prior to training?

Thanks,
mzeid


(jean.senellart) #2

Hello,

See the following topic talking about the role of validation data:

Although you could theoretically train without validation data, we blocked this feature in OpenNMT, and we don’t create them automatically so you just need to take 500-1000 sentences out of your bitext (and exclude these sentences from your training corpus), or better take some parallel data exactly corresponding to your use-case.


(Mohamed Zeid) #3

Thanks Jean for your reply! I read through the other thread, but I still don’t get the idea of using a subset of the to-train data for validation? I assume that the train would not start if there is no validation files, right?