I first propose the support of BLEU as a validation metric which should be a very useful feature:
It's not exactly related to this feature request as we don't save the translation output but it's the idea to introduce BLEU scoring (or similar score) during the training process. Everything that used the validation perplexity now more generally use the validation score, including the learning rate decay.
Monitoring BLEU evolution should be more natural than perplexity. However, as we set up the preprocessing, BLEU will be computed against gold sentences with resolved vocabulary, i.e. with OOV replaced by
<unk> tokens. I think this is a detail and does not impact the score interpretation but let me know if you have any concerns regarding this.
Then, I think we will add what @Etienne38 proposed based on @mrelich work:
The difference is that it's not directly related to validation (in the sense of how it is used in OpenNMT) but is a way to automatize translation of files across epochs. Something like this:
th train.lua [...] -translate_files file1.txt file2.txt file3.txt -translate_every 2
file3.txt every 2 epochs. Does this make sense/is useful?