Save validation translations at each epoch

mrelich · June 23, 2017, 10:16pm

@jean.senellart Sorry I got busy the last few days and didn’t respond. I like @dbl suggestion to use Levenshtein-Damerau divided by length. I see there is already some implementation now for this, but if needed I can test / work on it in the coming weekend.

Thanks again for including this!!

dbl · June 28, 2017, 8:01am

I think it would be nice to have TER (translation error rate) as well, but it would likely take me quite a while to code up TER in lua, and I’m not sure about performance.

In theory, one could make a subprocess call to the latest Java implementation of TER, but I doubt we want such an external dependency.

jean.senellart · June 29, 2017, 7:26am

Hi David, for TER I will take it - we have reimplemented it several times, so I it should be quite fast

jean.senellart · July 6, 2017, 11:21am

TER is now also available - as a metric for score.lua or as a validation metric. See http://opennmt.net/OpenNMT/tools/scorer/

mrelich · July 11, 2017, 10:07pm

@jean.senellart I was coming back to this again to see if in the updates the translation / epoch was included, but it doesn’t seem so. Does that mean we will go with @vince62s script for unloading / reloading during training? It’s a pity, since it seems trivial to just run through the validation data during the training.

If there is no plan to add this feature, then I will try to maintain my own script for this in case anyone else is interested. I should be getting back to this task in the next few days and will update my fork.

Cheers

guillaumekln · July 12, 2017, 7:24am

Now we could save validation translation when using translation-based validation metrics (BLEU, D.-L. ratio). Would that work for you?

vince62s · July 12, 2017, 7:31am

It would be preferable to have a flag to trigger the saving.
When using sampling with hundreds of epochs it’s not convenient.

vince62s · July 12, 2017, 8:45am

additional thing since @guillaumekln already PR’ed…

Would you prefer an output with just the valid set translation or even more uselfull a file
with both reference and translation, plus a score for each sentence (Bleu or TER) at the beginning of each line.

(similar output to analysis.perl in the Moses project)

mrelich · July 12, 2017, 4:21pm

@guillaumekln I’m with @vince62s on this one, I think having a flag to determine whether or not to save would be best, that way users can choose whether or not to turn it on. Having it also output the scores would be nice for additional analysis. Would you like me to take a stab at implementing this?

guillaumekln · July 12, 2017, 7:32pm

Of course the feature will be behind an option.

I actually have something almost ready:

I can also add the scores on each line. Is there any request for a specific format?

mrelich · July 12, 2017, 7:51pm

@guillaumekln Sweet, that looks great! So atm it would just dump the output, which is actually fine for me. If we also want the scores, then either csv, or maybe some other delimiter (eg | or <>).

vince62s · July 12, 2017, 8:24pm

suggestion
field1= score
field2=translation
field3=reference
separator |||

because | might be the feature separator already

guillaumekln · July 13, 2017, 7:36pm

This is now on master.

Note that the comment I made earlier in this thread still applies:

However, as we set up the preprocessing, BLEU will be computed against gold sentences with resolved vocabulary, i.e. with OOV replaced by <unk> tokens.

mrelich · July 14, 2017, 3:38pm

Sounds good. I will test this today. Thanks again for the feature addition!

mrelich · July 14, 2017, 4:40pm

The output looks great, thank you again!