Improve BLEU by Coverage and Context Gate

(DucAnhLe) #1

I found that using coverage mechanism significantly improves upon a standard attention-based NMT system by +1.8 BLEU, and incorporating context gate obtains a further improvement of +1.6 BLEU (i.e., +3.4 BLEU in total).
Is there any plan to implement these functions?

(jean.senellart) #2

Hello, which coverage mechanism are you referring to? For reference, I did implement different coverage mechanisms available here: - and found significant gains on small dataset (few million sentences) but the effect is wearing off for larger dataset (10M+).

For context gate, we are looking at it…

(Vincent Nguyen) #3

Jean, will you merge this PR or not ?

(Guillaume Klein) #4

FYI, context gate is implemented in OpenNMT-py:

(jean.senellart) #5

Thanks! I will integrate the context gates too, it is pretty straightforward and will also look at differences between the coverage implementations before merging the PR.

(Vincent Nguyen) #6

I just tested the PR on a small system (2x500) and small corpus 500k (fr-en).
I have almost zero improvement with nn10.
what was the task where you saw significant improvement?

(jean.senellart) #7

Chinese to English - on 2M sentence data set.