wz337
(Wz337)
October 7, 2019, 7:49am
1
Hi,
I am trying to load the translation pre-trained model (German-English) from http://opennmt.net/Models-py/ and put it into a model object rather than loading the state_dict. The reason is that I am doing a research about model interpretability and I would like to access the embedding layer of the model.
However, I am not sure which encoder and which decoder as well as model parameters (bidirectional or not, n_layers) are used to train the pretrained-model.
Essentially, I want to load the state_dict into an model object and be able to see something like this:
Seq2Seq(
(encoder): Encoder(
(embedding): Embedding(7855, 256)
(rnn): LSTM(256, 512, num_layers=2, dropout=0.5)
(dropout): Dropout(p=0.5, inplace=False)
)
(decoder): Decoder(
(embedding): Embedding(5893, 256)
(rnn): LSTM(256, 512, num_layers=2, dropout=0.5)
(linear): Linear(in_features=512, out_features=5893, bias=True)
(dropout): Dropout(p=0.5, inplace=False)
)
)
Thanks. If anyone could help me with it, that would be great!
Maybe @francoishernandez can advise here.
Hi @wz337 ,
Everything is stored in the checkpoint file.
You can load it with ckpt = torch.load("iwslt-brnn2.s131_acc_62.71_ppl_7.74_e20.pt")
and then browse what’s inside.
E.g.:
ckpt.keys
--> dict_keys(['model', 'generator', 'vocab', 'opt', 'epoch', 'optim'])
ckpt['opt']
--> Namespace(accum_count=1, adagrad_accumulator_init=0, adam_beta1=0.9, adam_beta2=0.999, batch_size=64, batch_type='sents', brnn=True, brnn_merge='concat', cnn_kernel_width=3, context_gate=None, copy_attn=False, copy_attn_force=False, copy_loss_by_seqlength=False, coverage_attn=False, data='/n/holylfs/LABS/rush_lab/data/iwslt14-de-en/data-onmt-master/iwslt14.tokenized.de-en.150-150.3-3', dec_layers=2, decay_method='', decoder_type='rnn', dropout=0.3, enc_layers=2, encoder_type='brnn', epochs=25, exp='', exp_host='', feat_merge='concat', feat_vec_exponent=0.7, feat_vec_size=-1, fix_word_vecs_dec=False, fix_word_vecs_enc=False, global_attention='general', gpuid=[3], input_feed=1, label_smoothing=0.0, lambda_coverage=1, layers=-1, learning_rate=1.0, learning_rate_decay=0.5, max_generator_batches=32, max_grad_norm=5, model_type='text', normalization='sents', optim='sgd', param_init=0.1, position_encoding=False, pre_word_vecs_dec=None, pre_word_vecs_enc=None, report_every=50, reuse_copy_attn=False, rnn_size=500, rnn_type='LSTM', sample_rate=16000, save_model='/n/holylfs/LABS/rush_lab/jc/onmt-master/iwslt14-de-en/models/baseline-brnn2.s131/baseline-brnn2.s131', seed=131, share_decoder_embeddings=False, share_embeddings=False, src_word_vec_size=500, start_checkpoint_at=0, start_decay_at=8, start_epoch=1, tensorboard=False, tensorboard_log_dir='runs', tgt_word_vec_size=500, train_from='', truncated_decoder=0, valid_batch_size=32, warmup_steps=4000, window_size=0.02, word_vec_size=-1)
ckpt['model'].keys()
for the state_dict structure --> dict_keys(['encoder.embeddings.make_embedding.emb_luts.0.weight', 'encoder.rnn.weight_ih_l0', 'encoder.rnn.weight_hh_l0', 'encoder.rnn.bias_ih_l0', 'encoder.rnn.bias_hh_l0', 'encoder.rnn.weight_ih_l0_reverse', 'encoder.rnn.weight_hh_l0_reverse', 'encoder.rnn.bias_ih_l0_reverse', 'encoder.rnn.bias_hh_l0_reverse', 'encoder.rnn.weight_ih_l1', 'encoder.rnn.weight_hh_l1', 'encoder.rnn.bias_ih_l1', 'encoder.rnn.bias_hh_l1', 'encoder.rnn.weight_ih_l1_reverse', 'encoder.rnn.weight_hh_l1_reverse', 'encoder.rnn.bias_ih_l1_reverse', 'encoder.rnn.bias_hh_l1_reverse', 'decoder.embeddings.make_embedding.emb_luts.0.weight', 'decoder.rnn.layers.0.weight_ih', 'decoder.rnn.layers.0.weight_hh', 'decoder.rnn.layers.0.bias_ih', 'decoder.rnn.layers.0.bias_hh', 'decoder.rnn.layers.1.weight_ih', 'decoder.rnn.layers.1.weight_hh', 'decoder.rnn.layers.1.bias_ih', 'decoder.rnn.layers.1.bias_hh', 'decoder.attn.linear_in.weight', 'decoder.attn.linear_out.weight'])
Hope this helps!