Local Attention

jinyeqiong · December 26, 2016, 2:51am

This model is on base of global attention mechanism, however, I have a question: How to use local attention mechanism to solve point to point not long sentence to long sentence question ? Can you say in detail? Thanks~

srush · December 26, 2016, 9:22pm

We haven’t yet implemented local attention, but we think it should be easy to add.

Currently global attention is implemented here:

github.com

OpenNMT/OpenNMT/blob/master/onmt/modules/GlobalAttention.lua

require('nngraph')

--[[ Global attention takes a matrix and a query vector. It
then computes a parameterized convex combination of the matrix
based on the input query.

    H_1 H_2 H_3 ... H_n
     q   q   q       q
      |  |   |       |
       \ |   |      /
           .....
         \   |  /
             a

Constructs a unit mapping:
  $$(H_1 .. H_n, q) => (a)$$
  Where H is of `batch x n x dim` and q is of `batch x dim`.

  The full function is  $$\tanh(W_2 [(softmax((W_1 q + b_1) H) H), q] + b_2)$$.

This file has been truncated. show original

Local attention should be (1) added a similar unit, (2) adding a command-line option.

We’ll probably get to it soon, but would love a pool request.

jean.senellart · July 8, 2017, 8:33am

2 posts were split to a new topic: How to control length of output sequence

jean.senellart · July 8, 2017, 8:37am

First implementation here:

and thread for OpenNMT-py:

So far I do not reproduce Luong results - testing with window size of 11. However the implementation is actually trickier than global attention because we need to position window and get correctly gradient calculation - anyone interested in code/formula review?