Ensemble decoding

I think it would be great to implement an ensemble decoding feature in OpenNMT since it seems to improve systems performance.

Do you think it would be easy to implement?
I have just started to be familiar with OpenNMT code. This decoding feature should be implemented in onmt.translate.Translator.lua module, right?


yes Translator.lua is the entry point. @Dakun did implement the feature in the original seq2seq-attn project so I am letting him comment more.

The process will be:

  • you will have to load several models in Translator:__init (for instance, I suggest we accept -model option to take several values separated by a comma to keep the commandline simple)
  • then during the beam search - you basically need to average the output from the different models.

There are some possible optimizations like assigning a different GPU for each model for faster processing but we can let that for later :slight_smile:.

Feel free to submit a PR and we will be glad to review!


Yes, Ensemble decoding can improve the performance especially when the sub-models are diversed.
And it’s not a difficult task to implement Ensemble decoding.

As mentioned by @jean.senellart, the following steps are necessary:

  1. initialization, to load different models
  2. syncronization during decoding:
  • if you try decoding different models in serial, we recommend “lua coroutine” to simply the procedure
  • if you use parallel decoding, “lua threads.[Mutex,Condition]” is necessary, since you cannot estimate which process will finish at what time.
  • collecting results by (e.g.) voting and put it for next estimation

thanks a lot for your rapid (and useful!) feedback!

I think I will try to implement first the “serial ensemble decoding” .

I will let you know if I have any troubles in the process and, of course, if I succed with the task :slight_smile:

I would also recommend awaiting for the new beam search code ( https://github.com/OpenNMT/OpenNMT/pull/48 ) to land before starting on this.

thanks @srush for the advice!
I will wait then until the new beam search code is released.

I think serial decoding does not do the job. With Parallel decoding, you are able to exploit different features of different decoders, because one decoder maybe be better in some specific aspects than the others. But with serial decoding, you can accept the total translation or you can reject it. I mean, it is not possible to mix strength of different decoders at the same time.

Hi @nrazavi!
thanks for your insight!
However, I think that serial decoding can work too, although for sure wont be as efficient as parallel decoding.
Think that you can mix the decoders information after translating each beam from a batch, so you will have the information from all of your models to decide which translation is best for a concrete beam. Also, you will be able to (somehow) feed your decoders with this information to make the next beam search step.
Parallelizing the process we will win on speed 'cause each model will translate the beam at the same time instead of one after the other. But my guess is that we will combine the information from the models in a similar way as in the sequential decoding .

As I said, I haven’t implemented this already but this is the idea I have in mind, although I am still working on how to do it :wink:

Hi all,

I was wondering if anyone is working on this feature request. I would be very keen to have ensemble decoding included in OpenNMT: I’ve seen it improve the output of similar NMT systems drastically in the past, and it would be great to combine this improved performance with the user-friendliness of OpenNMT.




Yes, some work has been done on ensemble decoding but it’s not ready for release yet. I hope we could add it in the upcoming weeks.

1 Like

I invite you to start experimenting with it and let me know how it works for you. It is marked as a Work In Progress just because I would like to make the integration nicer (and more efficient?) but it should work as described.

git clone https://github.com/guillaumekln/OpenNMT.git OpenNMT-ensemble
cd OpenNMT-ensemble
git checkout ensemble

[details=or using an existing OpenNMT repository]```bash
git remote add guillaumekln https://github.com/guillaumekln/OpenNMT.git
git fetch guillaumekln
git checkout ensemble


Thank you very much! :slight_smile:

I will let you know how it works for us asap :slight_smile:

Hi @guillaumekln !
We’ve tried the ensemble decoding.
Regarding to the warning in the documentation…
do the models have to share also the source vocabulary as well?
If not, maybe some of them are not going to understand the input sentence to translate.
I mean, for instance, label 23 can be ‘house’ for model1 but ‘car’ for model2, and if you have the source codified using the source dictionary from model1, then model2 will understand a different sentence to translate because it will be using the representations for word ‘car’ and not ‘house’, doesn’t it?

Apart from that, I’ve tried it and it works fine.
I have used 4 models from the same training for English to Spanish translation models.
In terms of speed, it works around 4 times slower than single decoding.
In terms of BLEU, we gain 0.2 .
And in terms of TER, we gain 0.4 .
Does that resemble your results?

thank you very much!
this will be very helpful for future experiments! :slight_smile:

1 Like

You are correct. In the current implementation the source vocabulary also has to be the same, I should mention that. However, this constraint could actually be removed by just adapting the batch input for each source vocabulary.

I got up to +1 BLEU but it depends a lot of the models used of course, how close and well trained they are.

Thanks for the feedback.

1 Like

mmm just for curiosity…
why do you perform an average over the probs and not over the logprobs obtained by the decoders?

1 Like

Mmh, maybe it seemed cleaner to me at first (averaging actual probabilities) but in practice both approaches should produce the same result so I will certainly simplify this.


Hi, thanks a lot for this feature. Though it seems to fail when trying with -idx_files. Could you please look into this? Thanks!

[10/19/17 16:54:51 INFO] Using GPU(s): 1, 2, 3, 4
[10/19/17 16:54:51 WARNING] The caching CUDA memory allocator is enabled. This allocator improves performance at the cost of a higher GPU memory usage. To optimize for memory, consider disabling it by setting the environment variable: THC_CACHING_ALLOCATOR=0
/home/palaskar/torch-lua5.2/install/bin/lua: ...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:179: [thread 3 endcallback] ./onmt/translate/Translator.lua:143: attempt to index field 'src' (a nil value)
stack traceback:
        ./onmt/translate/Translator.lua:143: in function <./onmt/translate/Translator.lua:140>
        (...tail calls...)
        [C]: in function 'xpcall'
        ...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:174: in function 'dojob'
        ...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:264: in function 'synchronize'
        ./onmt/utils/ThreadPool.lua:31: in function 'dispatch'
        ./onmt/translate/Translator.lua:132: in function '__init'
        ...laskar/torch-lua5.2/install/share/lua/5.2/torch/init.lua:91: in function 'new'
        translate.lua:53: in function 'main'
        translate.lua:201: in main chunk
        [C]: in function 'dofile'
        ...rch-lua5.2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?
stack traceback:
        [C]: in function 'error'
        ...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:179: in function 'dojob'
        ...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:264: in function 'synchronize'
        ./onmt/utils/ThreadPool.lua:31: in function 'dispatch'
        ./onmt/translate/Translator.lua:132: in function '__init'
        ...laskar/torch-lua5.2/install/share/lua/5.2/torch/init.lua:91: in function 'new'
        translate.lua:53: in function 'main'
        translate.lua:201: in main chunk
        [C]: in function 'dofile'
        ...rch-lua5.2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?