Save validation translations at each epoch

(Etienne Monneret) #1

If ONMT is building the validation translation for its evaluation at each epoch, why not saving it in a file at the same time as the model ? Perhaps with the same name as the model, and a “.valid.trans” suffix ?

(Guillaume Klein) #2

It does not actually translate the validation data but just computes the perplexity on this dataset.

See also:

Early stopping : a fake solution?
(Etienne Monneret) #3

It would be useful to be able to give an optional list of files to be translated at each epochs.

In my tests, I want to eval BLEU/TER/WER/… values for each epoch. I want to do it on a validation file, and a small part of the training file, to get curves similars to training PPL and validation PPL, but with evaluation metrics. To do it in a comfortable way, I have to stop the training, launch all translations, then restart the training. It’s time consuming, and there is always a risk to damage some options somewhere, and not restart in the right conditions.

(Guillaume Klein) #4

We are thinking about a meta-script that runs the training epoch by epoch, runs arbitrary commands after one epoch (BLEU evaluation for example) and sets new options as needed. It could just be a script that abstracts the retraining command line and let the user plug arbitrary rules.

No definite plans for it though but I think @vince62s is exploring the possibilities of such training process.

(Terence Lewis) #5

I’m starting to realise too that this would be something handy :slight_smile:

(Vincent Nguyen) #6

I will try to check in a new recipe to support this. Patience.

I am just validating it works fine in various contexts.

@tel34 : did you try a deeper network ?

(Terence Lewis) #7

@tel34 : did you try a deeper network ?
@vince62s Training is ongoing - it’s a big corpus :slight_smile: Will report my findings as soon as it finishes!

(Matt Relich) #8

Hi all! I also wanted this feature and I hacked in a translate option into the Traner.lua. You can find my code here:

If you think this might be useful I can clean it up and merge.

(Guillaume Klein) #9

Hi, very nice!

We could definitely check that in. And it is a good opportunity to add BLEU as a validation metric.

We’ll see that next week. :wink:

(Matt Relich) #10

Great! Let me know if you need any help or want me to make any modifications.

(jean.senellart) #11

Hi Matt,

I introduced BLEU code in lua here:

would you like to add this up in you branch to also calculate the score at each epoch?

it is now merged in master - the function to call for calculating BLEU in the code is simply:

BLEU = onmt.scorers.bleu(cand, refs) 


  • cand is a table of sequences representing translation output (each sequence is a table of tokens).
  • refs is a table of references, each reference has the same format than cand

(Vincent Nguyen) #12


is this “smoothed” BLEU+1 or BLEU ?
the difference is mainly for small length sequences.

just to make sure this is comparable.

(jean.senellart) #13

It is regular BLEU - as implemented in multi-bleu.perl.

(Vincent Nguyen) #14

ok, just to be clear, with no smoothing, it returns Bleu=0 for sequence length of 3 or less, when default order is 4.
In Moses, when per sentence scoring is needed, since you may have a lot of very short sentence, it is smoothed with the +1. I am just thinking that since we sort by size to create batches, you may have several batches scored with 0.

no an issue, just for reference.

(Guillaume Klein) #15

Thanks @jean.senellart for the BLEU implementation.

I’m working to integrate this and extend @mrelich work. :wink:

(Matt Relich) #16

@jean.senellart sorry I didn’t see your message! For some reason I didn’t get notified of the message. Yeah I am happy to integrate into my branch, but it seems @guillaumekln is already working towards this. I’m happy to help if needed, just ping next time, since that seems to get an email to my inbox.

(Guillaume Klein) #17

I first propose the support of BLEU as a validation metric which should be a very useful feature:

It’s not exactly related to this feature request as we don’t save the translation output but it’s the idea to introduce BLEU scoring (or similar score) during the training process. Everything that used the validation perplexity now more generally use the validation score, including the learning rate decay.

Monitoring BLEU evolution should be more natural than perplexity. However, as we set up the preprocessing, BLEU will be computed against gold sentences with resolved vocabulary, i.e. with OOV replaced by <unk> tokens. I think this is a detail and does not impact the score interpretation but let me know if you have any concerns regarding this.

Then, I think we will add what @Etienne38 proposed based on @mrelich work:

The difference is that it’s not directly related to validation (in the sense of how it is used in OpenNMT) but is a way to automatize translation of files across epochs. Something like this:

th train.lua [...] -translate_files file1.txt file2.txt file3.txt -translate_every 2

which translates file1.txt, file2.txt, and file3.txt every 2 epochs. Does this make sense/is useful?

(Etienne Monneret) #18


(Vincent Nguyen) #19

IMO, translating files could be done separately in another process.

I find it odd to mix “valid” and “test” stuff in the training process.

Maybe just translating the “valid_src” (since it’s done to score with BLEU) might be enough.

(Etienne Monneret) #20

Each epoch, without stopping/restarting the training process, I would enjoy to be able to translate files like:

  • validation set = sentences supposed not overlapping the training data
  • checking set = sentences contained in the training data
  • both of them possibly in 2 versions = with a generic content or in-domain content
  • a file used in an other completely different training, for example from an other domain, for comparison