I am new to OpenNMT and do not understand the forward pass in the Decoder (code here). It uses the StackedLSTM module and unrolls the LSTM within the for loop lines 113-121. At each iteration it receives a new vector of cell and output states from its StackedLSTM. The way I am reading the code it discards this new cell state and keeps passing the initial state at each iteration (Line 118). When I changed that line so that it overrides the ‘hidden’ variable with the new cell state, the behavior during training does not seem to change. I guess I must misunderstand something, someone can help?
Line 118 is indeed incorrect. It should be:
output, hidden = self.rnn(emb_t, hidden)
Will get fixed by:
Oh alright thanks for clarifying.
So in the current model, the prediction of word i is a function f(i-1, s_0) of the previous word and the last hidden state of the encoder, is that right? I find it surprising that a bigram language model can produce good translations.
The prediction of word is a function of:
- the previous decoder hidden states (for
t = 0, it is the last encoder hidden states)
- the previous word
- the previous context vector (output of the attention layer)
- the output of every encoder timesteps
Hello Guillaume, thanks again. Doesn’t the error in line 118 mean that the decoder hidden states do not propagate through the network, and thus are equal for every t?
Exactly, so if you take into account this coding mistake the word prediction is not a function of the previous decoder hidden states but only the last encoder hidden states.
Alright, I think I got it now. Thanks!