Local Attention

This model is on base of global attention mechanism, however, I have a question: How to use local attention mechanism to solve point to point not long sentence to long sentence question ? Can you say in detail? Thanks~

We haven’t yet implemented local attention, but we think it should be easy to add.

Currently global attention is implemented here:

Local attention should be (1) added a similar unit, (2) adding a command-line option.

We’ll probably get to it soon, but would love a pool request.

2 posts were split to a new topic: How to control length of output sequence

First implementation here:

and thread for OpenNMT-py:

So far I do not reproduce Luong results - testing with window size of 11. However the implementation is actually trickier than global attention because we need to position window and get correctly gradient calculation - anyone interested in code/formula review?