This model is on base of global attention mechanism, however, I have a question: How to use local attention mechanism to solve point to point not long sentence to long sentence question ? Can you say in detail? Thanks~
We haven’t yet implemented local attention, but we think it should be easy to add.
Currently global attention is implemented here:
Local attention should be (1) added a similar unit, (2) adding a command-line option.
We’ll probably get to it soon, but would love a pool request.
2 posts were split to a new topic: How to control length of output sequence
First implementation here:
and thread for OpenNMT-py:
So far I do not reproduce Luong results - testing with window size of 11. However the implementation is actually trickier than global attention because we need to position window and get correctly gradient calculation - anyone interested in code/formula review?