Error when using the ensemble setting

nvr-rug · March 20, 2018, 11:51am

I tried the ensemble-branch of OpenNMT to use multiple models when translating (as explained here).

However, I trained my models with the latest version of OpenNMT, while that branch is a year old. When trying to run the ensemble part, I get this error:

./onmt/utils/Tensor.lua:137: ‘for’ limit must be a number
stack traceback:
./onmt/utils/Tensor.lua:137: in function ‘initTensorTable’
./onmt/modules/BiEncoder.lua:143: in function ‘forward’
./onmt/translate/Translator.lua:331: in function ‘fun’
./onmt/utils/ThreadPool.lua:24: in function ‘dispatch’
./onmt/translate/Translator.lua:326: in function ‘translateBatch’
./onmt/translate/Translator.lua:532: in function 'translate’
translate.lua:110: in function 'main’
translate.lua:201: in main chunk
[C]: in function ‘dofile’

Is this due to the fact that I trained with the new version of OpenNMT? Is there another way of ensembling in the new version? Or is it something else that I missed?

emartinezVic · March 20, 2018, 12:22pm

Hi @nvr-rug,

I’ve used the ensemble implementation without any problems, but I was using “old” OpenNMT models too…

I looked at the code and I think there is a problem with the numStates/numEffectiveLayers argument for the BiEncoder object.
In the most recent version it is used numEffectiveLayers and in the ensemble version the same paremeter is called numStates.
To ensure the forward compatibility, you can add the following to the BiEncoder.load function in the BiEncoder.lua file from the ensemble release (around line 104 or so) :

self.args.numStates = self.args.numStates or self.args.numEffectiveLayers

Hope this works

guillaumekln · March 20, 2018, 12:44pm

Also check out the script tools/average_models.lua. That might work for you with the added benefit of decoding a single model.

nvr-rug · March 20, 2018, 12:58pm

Thanks for the very quick answers!

@emartinezVic, unfortunately this did not solve the problem, I still get the same error…perhaps there are more parameters that differ?

@guillaumekln, I also use averaging to average over epoch-model within individual runs, but now I want to average models of different individual runs (but with the same parameters) and for that I have to use ensembling, right? (I tried averaging before and that errors).

jean.senellart · March 22, 2018, 5:03pm

Hello @nvr-rug

we will look at updating the branch - can you open an issue on github, specifying exactly the parameters you are using for training your models?

yes if you want to use multiple models, you can not average, only ensemble (but it should not fail either).

nvr-rug · March 23, 2018, 9:07am

Thanks for your reply, just made an issue!

vmsanchez · April 12, 2018, 9:00am

Hello,

Just in case someone is in a hurry and wants to use the ensemble branch with models trained with a newer version of OpenNMT, these are the changes I had to do in order to make it work, inspired by @emartinezVic 's comment:

onmt/modules/BiEncoder.lua, function BiEncoder.load: add the following line just after " self.args.brnn_merge = self.args.brnn_merge or self.args.merge":
self.args.numEffectiveLayers = self.args.numStates or self.args.numEffectiveLayers
onmt/modules/Encoder.lua, function Encoder.load: add the following line just after “self.args = pretrained.args”:
self.args.numEffectiveLayers = self.args.numStates or self.args.numEffectiveLayers

Now I am able to translate with an ensemble of models without any crash, although I cannot report about translation quality yet. I cannot guarantee that the same results as if the models were trained with the ensemble branch are obtained.

emartinezVic · April 12, 2018, 10:23am

thank you very much Victor! those are great news
I wasn’t able to find the Encoder.lua parameter forward compatibility issue.

If I am not mistaken, the ensembling is done in decoding time only, so the models trained with the ensemble branch were trained in the same way as using the opennmt “official” branch.
Then, I think the translation quality will depend only on the random initialization, the training paramenters (optimizer, learning rate, etc.) and the training data.

guillaumekln · April 13, 2018, 7:33am

That’s great to hear, Victor! Do you mind making a PR targeting this branch?

vmsanchez · April 16, 2018, 10:17am

Done!

I also implemented the option to save attention, since I needed it for my experiments.

guillaumekln · April 16, 2018, 10:19am

Awesome, thanks!

nvr-rug · April 17, 2018, 9:21am

Thanks a lot for the implementation!

One thing, though, I had to copy onmt/utils/Error.lua from the main branch to the Ensemble branch before it worked for me, since the file wasn’t there in the ensemble branch.