[Code Understanding] Decoder in Pytorch

MikoLoki · March 12, 2017, 12:06am

Hey,

I am new to OpenNMT and do not understand the forward pass in the Decoder (code here). It uses the StackedLSTM module and unrolls the LSTM within the for loop lines 113-121. At each iteration it receives a new vector of cell and output states from its StackedLSTM. The way I am reading the code it discards this new cell state and keeps passing the initial state at each iteration (Line 118). When I changed that line so that it overrides the ‘hidden’ variable with the new cell state, the behavior during training does not seem to change. I guess I must misunderstand something, someone can help?

Best,
Miko

guillaumekln · March 12, 2017, 10:05am

Line 118 is indeed incorrect. It should be:

output, hidden = self.rnn(emb_t, hidden)

Will get fixed by:

https://github.com/pytorch/examples/pull/82

MikoLoki · March 13, 2017, 8:05am

Oh alright thanks for clarifying.

MikoLoki · March 13, 2017, 11:14pm

So in the current model, the prediction of word i is a function f(i-1, s_0) of the previous word and the last hidden state of the encoder, is that right? I find it surprising that a bigram language model can produce good translations.

guillaumekln · March 14, 2017, 9:59am

The prediction of word is a function of:

the previous decoder hidden states (for t = 0, it is the last encoder hidden states)
the previous word
the previous context vector (output of the attention layer)
the output of every encoder timesteps

MikoLoki · March 14, 2017, 2:59pm

Hello Guillaume, thanks again. Doesn’t the error in line 118 mean that the decoder hidden states do not propagate through the network, and thus are equal for every t?

guillaumekln · March 14, 2017, 3:01pm

Exactly, so if you take into account this coding mistake the word prediction is not a function of the previous decoder hidden states but only the last encoder hidden states.

MikoLoki · March 14, 2017, 3:03pm

Alright, I think I got it now. Thanks!