Save validation translations at each epoch

mrelich · June 21, 2017, 3:29pm

@guillaumekln having that option would be perfect!

@vince62s I agree it’s not standard, but for some applications it’s nice to see the translated output to get a feel for where models are doing well or where they are struggling. This saves you the extra step of either stopping training to evaluate, or having to run another job in parallel (and for those that have to rely on AWS for GPUs, this requires firing up a machine, copying data, etc etc.).

vince62s · June 21, 2017, 3:53pm

My only fear is that if we add too many things to the basic training process it will become a monster with hundreds of options.
On the other hand it would quite easy / nice to implement to have a meta-script to accomplish whatever each single user would like to accomplish. For instance, in most cases it does not make sense to translate test files after 1 or 2 epochs, getting the BLEU score on the valid set is more than enough.

Documenting a project like this is a challenging task, and it becomes even more difficult reading the code…

Anyway, I am just scared that we end up with a “do_all_for_me_please.lua” script including tokenization depending on the language, preprocessing, training , translating ,…
cleaning corpus ?

WRT running another process for translating, why don’t you just open another session and run it on the same machine ?
AWS with 2 GPUs would be perfect…
Last but not least: have a look at recipes … https://github.com/OpenNMT/Recipes

Etienne38 · June 21, 2017, 4:11pm

Because performances on a single GPU, doing training while translating, are drastically falling down ! And this is only the performance aspect of the problem. If you have 9.5G filled for a training on a 11G GPU, you just can’t load once again the model to translate at the same time…

This demand is just motivated by real problems encountered in real situations, not just a joke for the pleasure of having a “do_all_for_me_please.lua” script…

mrelich · June 21, 2017, 4:14pm

@vince62s Yes, I completely understand where you are coming from. It is a delicate balance of creating something useful and not adding too much complexity. If translation after N epochs is where you want to draw the line, then that’s fine, I will drop it. It’s not difficult for me to continue adding this hack to newer version of OpenNMT, since it is very useful for me (and a few others I suppose).

I am using OpenNMT as part of a pipeline for building a spelling correction service, so perplexity and BLEU score are kind of useless in this case. I want to know a true accuracy, ie. how often these correct sequence of letters are predicted exactly. This cannot be done without the translated portion.

You are also right, I could simply pick a large GPU instance, but now you are effectively saying to spend more money when a simple software solution exists that doesn’t require the spending of money. I don’t need that extra GPU until after each epoch is done training, so why pay for it and let it sit idle most of the time?

vince62s · June 21, 2017, 4:19pm

guys I am not saying this is useless.

I did it myself and shared it here https://github.com/OpenNMT/Recipes (without 2 GPUs)

just questioning what should be and what should not be Onmt. (ie embedded vs meta script)

Etienne38 · June 21, 2017, 4:22pm

It’s the reason why I argued that a script won’t be a solution with a single GPU, if the model currently trained is not unloaded from de card.

Etienne38 · June 21, 2017, 4:31pm

I read your script carefully, but I don’t understand how it does a translation at each epoch.

vince62s · June 21, 2017, 5:05pm

ah ah … sorry I never pushed the last one, will do shortly.

Etienne38 · June 21, 2017, 7:36pm

As said, you are unloading the model at each epoch. If you want to translate 2 files, for example, you then need to reload it 3 times at each epoch. A bit heavy process. But, of course, it’s working.

In my original query on top of this thread, I was thinking that the translation was done in the learning process. Now, I know it’s wasn’t the case.

I still think that, if the translation would now be included in the learning process, a list of files to translate on the fly would be an interesting feature.

vince62s · June 21, 2017, 8:00pm

unloading / reloading the model takes a few seconds compared to actual translation time and training time.

tha bash way to do things is not optimal, but at one point of time we were also discussing some kind of yaml file to describe a training process / schedule. That may be the solution in the end.

Anyhow, I leave it to Guillaume / Jean to decide, it’s their baby

NB: having a look at the Moses EMS meta script could be inspiring.

jean.senellart · June 21, 2017, 9:33pm

Hi @mrelich, I was thinking about adding more metrics to complete this development - what would be useful for your use case: CER/WER?

dbl · June 21, 2017, 11:11pm

Levenshtein-Damerau edit distance divided by length of reference might be handy.

guillaumekln · June 22, 2017, 7:12am

Would you consider doing a pull request for that?

I made it very easy to extend, see for example the BLEUEvaluator:

github.com

OpenNMT/OpenNMT/blob/master/onmt/evaluators/BLEUEvaluator.lua

local BLEUEvaluator, parent = torch.class('BLEUEvaluator', 'TranslationEvaluator')

function BLEUEvaluator:__init(translatorOpt, dicts)
  parent.__init(self, translatorOpt, dicts)
end

function BLEUEvaluator:score(predictions, references)
  local bleu = onmt.scorers['bleu'](predictions, { references })
  return bleu * 100
end

function BLEUEvaluator:compare(a, b, delta)
  return onmt.evaluators.Evaluator.higherIsBetter(a, b, delta)
end

function BLEUEvaluator:__tostring__()
  return 'BLEU'
end

return BLEUEvaluator

that simply implements the logic of scoring a table of predictions against their references.

dbl · June 22, 2017, 7:58am

Sure, I’ll give it a try. Still don’t know lua very well, and I’m bogged down a bit at work, so it might be 6-8 weeks, but I’d like to do it.

jean.senellart · June 22, 2017, 10:53am

great! and let us know if you need help, we will be glad to help but we also want to train our developper community ;)!

dbl · June 23, 2017, 3:30pm

@jean.senellart @guillaumekln - I snuck in some time, so no need to wait 6-8 weeks after all. Just submitted it. Feedback welcome & encouraged!

vince62s · June 23, 2017, 4:07pm

this is great, will test this.

btw, do you know a “good” bash/perl/py script that the same thing off line from 2 text file ?

cheers.

dbl · June 23, 2017, 4:30pm

Eh… I saw this Cython implementation of DL edit distance which should be pretty fast. I haven’t used it myself, though. We have proprietary code for doing several metrics in parallel; unfortunately nothing I can share.

vince62s · June 23, 2017, 4:37pm

another one: I think you did this on per character basis.

does it make sense to do it also at a word level ?

dbl · June 23, 2017, 4:42pm

On a character-level, it’s edit distance; at the word-level, it’s WER (same algorithm). So, yes, it’s certainly a valid metric. We tend to prefer character-level because we believe it reflects post-editing effort better, but there are advantages and disadvantages to both approaches, in my opinion.

I’ve come up with a hybrid approach, but haven’t had the time to properly implement, test, and (very optimistically) publish it yet.