OpenNMT Forum

Transformer Error

I have these parameters for my model:

onmt_train -batch_size 128
-accum_count 3
-report_every 1
-world_size 1
-gpu_ranks 0
-layers 1
-valid_batch_size 64
-valid_steps 5000
-rnn_size 512
-data data/data
-pre_word_vecs_enc “data/embeddings.enc.pt”
-pre_word_vecs_dec “data/embeddings.dec.pt”
-src_word_vec_size 224
-tgt_word_vec_size 336
-fix_word_vecs_enc
-fix_word_vecs_dec
-save_model data/model
-save_checkpoint_steps 1000
-train_steps 100000
-model_type text
-encoder_type transformer
-decoder_type transformer
-rnn_type GRU
-global_attention mlp
-global_attention_function softmax
-early_stopping 500
-attention_dropout .3
-max_generator_batches 2
-tensorboard
-optim adam -adam_beta2 0.998 -warmup_steps 8000 -learning_rate 0.001
-max_grad_norm 0 -param_init 0 -param_init_glorot
-label_smoothing 0.1
I got this error :
[2019-10-30 19:30:04,869 INFO] * src vocab size = 193698
[2019-10-30 19:30:04,869 INFO] * tgt vocab size = 26009
[2019-10-30 19:30:04,870 INFO] Building model…
[2019-10-30 19:30:13,672 INFO] NMTModel(
(encoder): TransformerEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(193698, 224, padding_idx=1)
)
)
)
(transformer): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=512, out_features=512, bias=True)
(linear_values): Linear(in_features=512, out_features=512, bias=True)
(linear_query): Linear(in_features=512, out_features=512, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.3, inplace=False)
(final_linear): Linear(in_features=512, out_features=512, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=512, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=512, bias=True)
(layer_norm): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.3, inplace=False)
(relu): ReLU()
(dropout_2): Dropout(p=0.3, inplace=False)
)
(layer_norm): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.3, inplace=False)
)
)
(layer_norm): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
)
(decoder): TransformerDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(26009, 336, padding_idx=1)
)
)
)
(transformer_layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=512, out_features=512, bias=True)
(linear_values): Linear(in_features=512, out_features=512, bias=True)
(linear_query): Linear(in_features=512, out_features=512, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.3, inplace=False)
(final_linear): Linear(in_features=512, out_features=512, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=512, out_features=512, bias=True)
(linear_values): Linear(in_features=512, out_features=512, bias=True)
(linear_query): Linear(in_features=512, out_features=512, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.3, inplace=False)
(final_linear): Linear(in_features=512, out_features=512, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=512, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=512, bias=True)
(layer_norm): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.3, inplace=False)
(relu): ReLU()
(dropout_2): Dropout(p=0.3, inplace=False)
)
(layer_norm_1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.3, inplace=False)
)
)
(layer_norm): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
)
(generator): Sequential(
(0): Linear(in_features=512, out_features=26009, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2019-10-30 19:30:13,673 INFO] encoder: 46541760
[2019-10-30 19:30:13,673 INFO] decoder: 26286697
[2019-10-30 19:30:13,673 INFO] * number of parameters: 72828457
[2019-10-30 19:30:13,875 INFO] Starting training on GPU: [0]
[2019-10-30 19:30:13,875 INFO] Start training loop and validate every 5000 steps…
[2019-10-30 19:30:13,876 INFO] Loading dataset from data/data.train.0.pt
[2019-10-30 19:30:18,877 INFO] number of examples: 190000
Traceback (most recent call last):
File “/home/ubuntu/anaconda3/bin/onmt_train”, line 11, in
load_entry_point(‘OpenNMT-py==1.0.0rc2’, ‘console_scripts’, ‘onmt_train’)()
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/bin/train.py”, line 200, in main
train(opt)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/bin/train.py”, line 86, in train
single_main(opt, 0)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/train_single.py”, line 143, in main
valid_steps=opt.valid_steps)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/trainer.py”, line 243, in train
report_stats)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/trainer.py”, line 358, in _gradient_accumulation
outputs, attns = self.model(src, tgt, src_lengths, bptt=bptt)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch-1.3.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/models/model.py”, line 42, in forward
enc_state, memory_bank, lengths = self.encoder(src, lengths)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch-1.3.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/encoders/transformer.py”, line 127, in forward
out = layer(out, mask)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch-1.3.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/OpenNMT_py-1.0.0rc2-py3.6.egg/onmt/encoders/transformer.py”, line 48, in forward
input_norm = self.layer_norm(inputs)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch-1.3.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py”, line 541, in call
result = self.forward(input, **kwargs)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch-1.3.0-py3.6-linux-x86_64.egg/torch/nn/modules/normalization.py”, line 153, in forward
input, self.normalized_shape, self.weight, self.bias, self.eps)
File “/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch-1.3.0-py3.6-linux-x86_64.egg/torch/nn/functional.py”, line 1696, in layer_norm
torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[512], expected input with shape [
, 512], but got input of size[128, 45, 224]

Can Any one help me??

For the transformer architecture you need to have the same size for embeddings and hidden layers, so -word_vec_size 512 instead of -src_word_vec_size 224 -tgt_word_vec_size 336 should do the trick.