Error when using the ensemble setting


(R) #1

I tried the ensemble-branch of OpenNMT to use multiple models when translating (as explained here).

However, I trained my models with the latest version of OpenNMT, while that branch is a year old. When trying to run the ensemble part, I get this error:

./onmt/utils/Tensor.lua:137: ‘for’ limit must be a number
stack traceback:
./onmt/utils/Tensor.lua:137: in function ‘initTensorTable’
./onmt/modules/BiEncoder.lua:143: in function ‘forward’
./onmt/translate/Translator.lua:331: in function ‘fun’
./onmt/utils/ThreadPool.lua:24: in function ‘dispatch’
./onmt/translate/Translator.lua:326: in function ‘translateBatch’
./onmt/translate/Translator.lua:532: in function 'translate’
translate.lua:110: in function 'main’
translate.lua:201: in main chunk
[C]: in function ‘dofile’

Is this due to the fact that I trained with the new version of OpenNMT? Is there another way of ensembling in the new version? Or is it something else that I missed?


(Eva) #2

Hi @nvr-rug,

I’ve used the ensemble implementation without any problems, but I was using “old” OpenNMT models too…

I looked at the code and I think there is a problem with the numStates/numEffectiveLayers argument for the BiEncoder object.
In the most recent version it is used numEffectiveLayers and in the ensemble version the same paremeter is called numStates.
To ensure the forward compatibility, you can add the following to the BiEncoder.load function in the BiEncoder.lua file from the ensemble release (around line 104 or so) :

self.args.numStates = self.args.numStates or self.args.numEffectiveLayers

Hope this works :slight_smile:


(Guillaume Klein) #3

Also check out the script tools/average_models.lua. That might work for you with the added benefit of decoding a single model.


(R) #4

Thanks for the very quick answers!

@emartinezVic, unfortunately this did not solve the problem, I still get the same error…perhaps there are more parameters that differ?

@guillaumekln, I also use averaging to average over epoch-model within individual runs, but now I want to average models of different individual runs (but with the same parameters) and for that I have to use ensembling, right? (I tried averaging before and that errors).


(jean.senellart) #5

Hello @nvr-rug

we will look at updating the branch - can you open an issue on github, specifying exactly the parameters you are using for training your models?

yes if you want to use multiple models, you can not average, only ensemble (but it should not fail either).


(R) #6

Thanks for your reply, just made an issue!


(Víctor M. Sánchez-Cartagena) #7

Hello,

Just in case someone is in a hurry and wants to use the ensemble branch with models trained with a newer version of OpenNMT, these are the changes I had to do in order to make it work, inspired by @emartinezVic 's comment:

  • onmt/modules/BiEncoder.lua, function BiEncoder.load: add the following line just after " self.args.brnn_merge = self.args.brnn_merge or self.args.merge":
    self.args.numEffectiveLayers = self.args.numStates or self.args.numEffectiveLayers

  • onmt/modules/Encoder.lua, function Encoder.load: add the following line just after “self.args = pretrained.args”:
    self.args.numEffectiveLayers = self.args.numStates or self.args.numEffectiveLayers

Now I am able to translate with an ensemble of models without any crash, although I cannot report about translation quality yet. I cannot guarantee that the same results as if the models were trained with the ensemble branch are obtained.


(Eva) #8

thank you very much Victor! those are great news :slight_smile:
I wasn’t able to find the Encoder.lua parameter forward compatibility issue.

If I am not mistaken, the ensembling is done in decoding time only, so the models trained with the ensemble branch were trained in the same way as using the opennmt “official” branch.
Then, I think the translation quality will depend only on the random initialization, the training paramenters (optimizer, learning rate, etc.) and the training data.


(Guillaume Klein) #9

That’s great to hear, Victor! Do you mind making a PR targeting this branch?


(Víctor M. Sánchez-Cartagena) #10

Done!

I also implemented the option to save attention, since I needed it for my experiments.


(Guillaume Klein) #11

Awesome, thanks!


(R) #12

Thanks a lot for the implementation!

One thing, though, I had to copy onmt/utils/Error.lua from the main branch to the Ensemble branch before it worked for me, since the file wasn’t there in the ensemble branch.