Hello.
I am trying to load Encoder & Decoder separately and train model with ‘optim’ package. Assume Encoder & Decoder is pre-trained by OpenNMT’s train.lua script.
Here is my simplified code
Part1. load & initialize model.
local checkpoint = torch.load(path_to_pretrained_model)
my_encoder = onmt.Factory.loadEncoder(checkpoint.models.encoder) – brnn
my_decoder = onmt.Factory.loadDecoder(checkpoint.models.decoder)
onmt.utils.Cuda.convert(my_encoder)
onmt.utils.Cuda.convert(my_decoder)
my_encoder:training()
my_decoder:training()
params_e, grad_e = my_encoder:getParameters()
params_d, grad_d = my_decoder:getParameters()
Part2. Update Encoder & Decoder with optim package
loss, _ = optim.adam(feval_decoder, params_d, optim_state_decoder)
_, _ = optim.adam(feval_encoder, params_e, optim_state_encoder)
The problem is gradient norm of encoder (i.e. grad_e:norm()) is always zero.
The one thing I am not confident is whether I can use my_encoder:getParameters() to retrieve all the parameters and gradient tensors exist in Encoder. It seems that Model:initParams() utilizes :getParameters() function of each ‘modules’, but not applying it for whole Encoder or Decoder.
I really appreciate any comments.
Thank you