These days I’m trying to figure out where do the number of parameters come from, but I can’t get the right numbers as shown in the number of parameters
I know it’s a very basic question, but I really wanna understand how do parameters combine together.
This is my model, using word_vec_size
=50 and rnn_size=128
(encoder): RNNEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(50004, 50, padding_idx=1)
(rnn): LSTM(50, 64, dropout=0.3, bidirectional=True)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(50004, 50, padding_idx=1)
(dropout): Dropout(p=0.3)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.3)
(layers): ModuleList(
(0): LSTMCell(178, 128)
(attn): GlobalAttention(
(linear_context): Linear(in_features=128, out_features=128, bias=False)
(linear_query): Linear(in_features=128, out_features=128, bias=True)
(v): Linear(in_features=128, out_features=1, bias=False)
(linear_out): Linear(in_features=256, out_features=128, bias=True)
(softmax): Softmax()
(tanh): Tanh()
(copy_attn): GlobalAttention(
(linear_context): Linear(in_features=128, out_features=128, bias=False)
(linear_query): Linear(in_features=128, out_features=128, bias=True)
(v): Linear(in_features=128, out_features=1, bias=False)
(linear_out): Linear(in_features=256, out_features=128, bias=True)
(softmax): Softmax()
(tanh): Tanh()
(generator): CopyGenerator(
(linear): Linear(in_features=128, out_features=50004, bias=True)
(linear_copy): Linear(in_features=128, out_features=1, bias=True)
* number of parameters: 11799973
encoder: 2559592
decoder: 9240381
I wanna know how to calculate the encoder: 2559592
and decoder: 9240381
Here goes other training settings:
-layers 1
-global_attention mlp
-encoder_type brnn
-rnn_size 128
-word_vec_size 50