Hi!
I think it would be great to implement an ensemble decoding feature in OpenNMT since it seems to improve systems performance.
Do you think it would be easy to implement?
I have just started to be familiar with OpenNMT code. This decoding feature should be implemented in onmt.translate.Translator.lua module, right?
yes Translator.lua is the entry point. @Dakun did implement the feature in the original seq2seq-attn project so I am letting him comment more.
The process will be:
you will have to load several models in Translator:__init (for instance, I suggest we accept -model option to take several values separated by a comma to keep the commandline simple)
then during the beam search - you basically need to average the output from the different models.
There are some possible optimizations like assigning a different GPU for each model for faster processing but we can let that for later .
Feel free to submit a PR and we will be glad to review!
Yes, Ensemble decoding can improve the performance especially when the sub-models are diversed.
And itās not a difficult task to implement Ensemble decoding.
As mentioned by @jean.senellart, the following steps are necessary:
initialization, to load different models
syncronization during decoding:
if you try decoding different models in serial, we recommend ālua coroutineā to simply the procedure
if you use parallel decoding, ālua threads.[Mutex,Condition]ā is necessary, since you cannot estimate which process will finish at what time.
collecting results by (e.g.) voting and put it for next estimation
Hi
I think serial decoding does not do the job. With Parallel decoding, you are able to exploit different features of different decoders, because one decoder maybe be better in some specific aspects than the others. But with serial decoding, you can accept the total translation or you can reject it. I mean, it is not possible to mix strength of different decoders at the same time.
Hi @nrazavi!
thanks for your insight!
However, I think that serial decoding can work too, although for sure wont be as efficient as parallel decoding.
Think that you can mix the decoders information after translating each beam from a batch, so you will have the information from all of your models to decide which translation is best for a concrete beam. Also, you will be able to (somehow) feed your decoders with this information to make the next beam search step.
Parallelizing the process we will win on speed 'cause each model will translate the beam at the same time instead of one after the other. But my guess is that we will combine the information from the models in a similar way as in the sequential decoding .
As I said, I havenāt implemented this already but this is the idea I have in mind, although I am still working on how to do it
I was wondering if anyone is working on this feature request. I would be very keen to have ensemble decoding included in OpenNMT: Iāve seen it improve the output of similar NMT systems drastically in the past, and it would be great to combine this improved performance with the user-friendliness of OpenNMT.
I invite you to start experimenting with it and let me know how it works for you. It is marked as a Work In Progress just because I would like to make the integration nicer (and more efficient?) but it should work as described.
git clone https://github.com/guillaumekln/OpenNMT.git OpenNMT-ensemble
cd OpenNMT-ensemble
git checkout ensemble
[details=or using an existing OpenNMT repository]```bash
git remote add guillaumekln https://github.com/guillaumekln/OpenNMT.git
git fetch guillaumekln
git checkout ensemble
Hi @guillaumekln !
Weāve tried the ensemble decoding.
Regarding to the warning in the documentationā¦
do the models have to share also the source vocabulary as well?
If not, maybe some of them are not going to understand the input sentence to translate.
I mean, for instance, label 23 can be āhouseā for model1 but ācarā for model2, and if you have the source codified using the source dictionary from model1, then model2 will understand a different sentence to translate because it will be using the representations for word ācarā and not āhouseā, doesnāt it?
Apart from that, Iāve tried it and it works fine.
I have used 4 models from the same training for English to Spanish translation models.
In terms of speed, it works around 4 times slower than single decoding.
In terms of BLEU, we gain 0.2 .
And in terms of TER, we gain 0.4 .
Does that resemble your results?
thank you very much!
this will be very helpful for future experiments!
You are correct. In the current implementation the source vocabulary also has to be the same, I should mention that. However, this constraint could actually be removed by just adapting the batch input for each source vocabulary.
I got up to +1 BLEU but it depends a lot of the models used of course, how close and well trained they are.
Mmh, maybe it seemed cleaner to me at first (averaging actual probabilities) but in practice both approaches should produce the same result so I will certainly simplify this.
Hi, thanks a lot for this feature. Though it seems to fail when trying with -idx_files. Could you please look into this? Thanks!
[10/19/17 16:54:51 INFO] Using GPU(s): 1, 2, 3, 4
[10/19/17 16:54:51 WARNING] The caching CUDA memory allocator is enabled. This allocator improves performance at the cost of a higher GPU memory usage. To optimize for memory, consider disabling it by setting the environment variable: THC_CACHING_ALLOCATOR=0
/home/palaskar/torch-lua5.2/install/bin/lua: ...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:179: [thread 3 endcallback] ./onmt/translate/Translator.lua:143: attempt to index field 'src' (a nil value)
stack traceback:
./onmt/translate/Translator.lua:143: in function <./onmt/translate/Translator.lua:140>
(...tail calls...)
[C]: in function 'xpcall'
...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:174: in function 'dojob'
...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:264: in function 'synchronize'
./onmt/utils/ThreadPool.lua:31: in function 'dispatch'
./onmt/translate/Translator.lua:132: in function '__init'
...laskar/torch-lua5.2/install/share/lua/5.2/torch/init.lua:91: in function 'new'
translate.lua:53: in function 'main'
translate.lua:201: in main chunk
[C]: in function 'dofile'
...rch-lua5.2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
stack traceback:
[C]: in function 'error'
...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:179: in function 'dojob'
...r/torch-lua5.2/install/share/lua/5.2/threads/threads.lua:264: in function 'synchronize'
./onmt/utils/ThreadPool.lua:31: in function 'dispatch'
./onmt/translate/Translator.lua:132: in function '__init'
...laskar/torch-lua5.2/install/share/lua/5.2/torch/init.lua:91: in function 'new'
translate.lua:53: in function 'main'
translate.lua:201: in main chunk
[C]: in function 'dofile'
...rch-lua5.2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?