Extracting and visualizing the decoder attention weights

Is there any way to extract and visualize the attention weights for a given parallel sentence in the seq2seq learning framework. As shown in the Figure below each pixel (gray density value between 0 and 1) represents the weight (i,j) of the source token having the index i and the target token having index j.

Bahdanau et al. 2015 “Neural machine translation by jointly learning to align and translate”

If you are using translate.lua - the attention vector for each sentence is in the variable results[b].preds[n].attention (b being the id in the batch, and n id in the n-best list). It is a T x S tensor - where S is the source sentence length, and T the length of this specific translation.


I think this is what you’re looking for:
NMT Attention Alignment Visualizations

You may run translate.lua to translate with the -save_attention parameter to save attentions to a file.

The visualization program above takes the attention file as an input, and then it nicely prints out the visualization of word alignments with weights for you.

I was wondering what the values obtained by saving the attention mean. I’ve used the -save_attention parameter to save the values to a file. What I obtained is something like this:

1 ||| source sentence ||| score ||| target sentence tokenized ||| number number
Matrix like numbers

What does the score mean?
What do the last two numbers mean?

Thank you in advance


  • score: the cumulated log likelihood of the sentence.
  • two numbers: source length and target length