Improve BLEU by Coverage and Context Gate

I found that using coverage mechanism significantly improves upon a standard attention-based NMT system by +1.8 BLEU, and incorporating context gate obtains a further improvement of +1.6 BLEU (i.e., +3.4 BLEU in total).
Is there any plan to implement these functions?

Hello, which coverage mechanism are you referring to? For reference, I did implement different coverage mechanisms available here: - and found significant gains on small dataset (few million sentences) but the effect is wearing off for larger dataset (10M+).

For context gate, we are looking at it…


Jean, will you merge this PR or not ?

FYI, context gate is implemented in OpenNMT-py:

1 Like

Thanks! I will integrate the context gates too, it is pretty straightforward and will also look at differences between the coverage implementations before merging the PR.

1 Like

I just tested the PR on a small system (2x500) and small corpus 500k (fr-en).
I have almost zero improvement with nn10.
what was the task where you saw significant improvement?

Chinese to English - on 2M sentence data set.