Save validation translations at each epoch

ah ah … sorry I never pushed the last one, will do shortly.

1 Like

As said, you are unloading the model at each epoch. If you want to translate 2 files, for example, you then need to reload it 3 times at each epoch. A bit heavy process. But, of course, it’s working.

In my original query on top of this thread, I was thinking that the translation was done in the learning process. Now, I know it’s wasn’t the case.

I still think that, if the translation would now be included in the learning process, a list of files to translate on the fly would be an interesting feature.

unloading / reloading the model takes a few seconds compared to actual translation time and training time.

tha bash way to do things is not optimal, but at one point of time we were also discussing some kind of yaml file to describe a training process / schedule. That may be the solution in the end.

Anyhow, I leave it to Guillaume / Jean to decide, it’s their baby :slight_smile:

NB: having a look at the Moses EMS meta script could be inspiring.

Hi @mrelich, I was thinking about adding more metrics to complete this development - what would be useful for your use case: CER/WER?

1 Like

Levenshtein-Damerau edit distance divided by length of reference might be handy.

1 Like

Would you consider doing a pull request for that?

I made it very easy to extend, see for example the BLEUEvaluator:

that simply implements the logic of scoring a table of predictions against their references.

Sure, I’ll give it a try. Still don’t know lua very well, and I’m bogged down a bit at work, so it might be 6-8 weeks, but I’d like to do it. :slight_smile:

1 Like

great! and let us know if you need help, we will be glad to help but we also want to train our developper community ;)!

@jean.senellart @guillaumekln - I snuck in some time, so no need to wait 6-8 weeks after all. :smile: Just submitted it. Feedback welcome & encouraged!

3 Likes

this is great, will test this.

btw, do you know a “good” bash/perl/py script that the same thing off line from 2 text file ?

cheers.

Eh… I saw this Cython implementation of DL edit distance which should be pretty fast. I haven’t used it myself, though. We have proprietary code for doing several metrics in parallel; unfortunately nothing I can share.

another one: I think you did this on per character basis.

does it make sense to do it also at a word level ?

On a character-level, it’s edit distance; at the word-level, it’s WER (same algorithm). So, yes, it’s certainly a valid metric. We tend to prefer character-level because we believe it reflects post-editing effort better, but there are advantages and disadvantages to both approaches, in my opinion.

I’ve come up with a hybrid approach, but haven’t had the time to properly implement, test, and (very optimistically) publish it yet. :wink:

1 Like

@jean.senellart Sorry I got busy the last few days and didn’t respond. I like @dbl suggestion to use Levenshtein-Damerau divided by length. I see there is already some implementation now for this, but if needed I can test / work on it in the coming weekend.

Thanks again for including this!!

I think it would be nice to have TER (translation error rate) as well, but it would likely take me quite a while to code up TER in lua, and I’m not sure about performance.

In theory, one could make a subprocess call to the latest Java implementation of TER, but I doubt we want such an external dependency.

Hi David, for TER I will take it - we have reimplemented it several times, so I it should be quite fast

1 Like

TER is now also available - as a metric for score.lua or as a validation metric. See http://opennmt.net/OpenNMT/tools/scorer/

2 Likes

@jean.senellart I was coming back to this again to see if in the updates the translation / epoch was included, but it doesn’t seem so. Does that mean we will go with @vince62s script for unloading / reloading during training? It’s a pity, since it seems trivial to just run through the validation data during the training.

If there is no plan to add this feature, then I will try to maintain my own script for this in case anyone else is interested. I should be getting back to this task in the next few days and will update my fork.

Cheers

Now we could save validation translation when using translation-based validation metrics (BLEU, D.-L. ratio). Would that work for you?

It would be preferable to have a flag to trigger the saving.
When using sampling with hundreds of epochs it’s not convenient.