I found that using coverage mechanism significantly improves upon a standard attention-based NMT system by +1.8 BLEU, and incorporating context gate obtains a further improvement of +1.6 BLEU (i.e., +3.4 BLEU in total).
Is there any plan to implement these functions?
Hello, which coverage mechanism are you referring to? For reference, I did implement different coverage mechanisms available here: https://github.com/OpenNMT/OpenNMT/pull/174 - and found significant gains on small dataset (few million sentences) but the effect is wearing off for larger dataset (10M+).
For context gate, we are looking at itβ¦
Jean, will you merge this PR or not ?
FYI, context gate is implemented in OpenNMT-py:
Thanks! I will integrate the context gates too, it is pretty straightforward and will also look at differences between the coverage implementations before merging the PR.
Jean,
I just tested the PR on a small system (2x500) and small corpus 500k (fr-en).
I have almost zero improvement with nn10.
what was the task where you saw significant improvement?
thanks.
Vincent
Chinese to English - on 2M sentence data set.