When I put encoder and decoder as transformer, then, firstly, the model that is created doesn’t have transformer in encoder as shown below also, it doesn’t seem to work and gives error.
I am working on image-to-text translation.
command: python3 train.py -model_type img -data demo/demo -save_model demo-model -gpu_ranks 0 -batch_size 4 -max_grad_norm 20 -learning_rate 0.1 -word_vec_size 80 -encoder_type transformer -decoder_type transformer -heads 1 -transformer_ff 128 -image_channel_size 1
[2019-02-27 10:44:10,649 INFO] * tgt vocab size = 4138
[2019-02-27 10:44:10,650 INFO] Building model…
[2019-02-27 10:44:16,671 INFO] NMTModel(
(encoder): ImageEncoder(
(layer1): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(layer2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(layer3): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(layer4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(layer5): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(layer6): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batch_norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(batch_norm2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(batch_norm3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn): LSTM(512, 500, num_layers=2, dropout=0.3)
(pos_lut): Embedding(1000, 512)
)
(decoder): TransformerDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(4138, 80, padding_idx=1)
)
)
)
(transformer_layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=500, out_features=500, bias=True)
(linear_values): Linear(in_features=500, out_features=500, bias=True)
(linear_query): Linear(in_features=500, out_features=500, bias=True)
(softmax): Softmax()
(dropout): Dropout(p=0.3)
(final_linear): Linear(in_features=500, out_features=500, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=500, out_features=500, bias=True)
(linear_values): Linear(in_features=500, out_features=500, bias=True)
(linear_query): Linear(in_features=500, out_features=500, bias=True)
(softmax): Softmax()
(dropout): Dropout(p=0.3)
(final_linear): Linear(in_features=500, out_features=500, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=500, out_features=128, bias=True)
(w_2): Linear(in_features=128, out_features=500, bias=True)
(layer_norm): LayerNorm(torch.Size([500]), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.3)
(relu): ReLU()
(dropout_2): Dropout(p=0.3)
)
(layer_norm_1): LayerNorm(torch.Size([500]), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm(torch.Size([500]), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.3)
)
(1): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=500, out_features=500, bias=True)
(linear_values): Linear(in_features=500, out_features=500, bias=True)
(linear_query): Linear(in_features=500, out_features=500, bias=True)
(softmax): Softmax()
(dropout): Dropout(p=0.3)
(final_linear): Linear(in_features=500, out_features=500, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=500, out_features=500, bias=True)
(linear_values): Linear(in_features=500, out_features=500, bias=True)
(linear_query): Linear(in_features=500, out_features=500, bias=True)
(softmax): Softmax()
(dropout): Dropout(p=0.3)
(final_linear): Linear(in_features=500, out_features=500, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=500, out_features=128, bias=True)
(w_2): Linear(in_features=128, out_features=500, bias=True)
(layer_norm): LayerNorm(torch.Size([500]), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.3)
(relu): ReLU()
(dropout_2): Dropout(p=0.3)
)
(layer_norm_1): LayerNorm(torch.Size([500]), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm(torch.Size([500]), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.3)
)
)
(layer_norm): LayerNorm(torch.Size([500]), eps=1e-06, elementwise_affine=True)
)
(generator): Sequential(
(0): Linear(in_features=500, out_features=4138, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2019-02-27 10:44:16,673 INFO] encoder: 9046272
[2019-02-27 10:44:16,673 INFO] decoder: 6676434
[2019-02-27 10:44:16,673 INFO] * number of parameters: 15722706
[2019-02-27 10:44:16,677 INFO] Starting training on GPU: [0]
[2019-02-27 10:44:16,677 INFO] Start training loop and validate every 10000 steps…
[2019-02-27 10:44:17,160 INFO] Loading dataset from demo/demo.train.0.pt, number of examples: 500
/usr/local/lib/python3.5/dist-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
var = torch.tensor(arr, dtype=self.dtype, device=device)
Traceback (most recent call last):
File “train.py”, line 109, in
main(opt)
File “train.py”, line 39, in main
single_main(opt, 0)
File “openNMT/OpenNMT-py-master/onmt/train_single.py”, line 116, in main
valid_steps=opt.valid_steps)
File “openNMT/OpenNMT-py-master/onmt/trainer.py”, line 209, in train
report_stats)
File “openNMT/OpenNMT-py-master/onmt/trainer.py”, line 318, in _gradient_accumulation
outputs, attns = self.model(src, tgt, src_lengths, bptt=bptt)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “openNMT/OpenNMT-py-master/onmt/models/model.py”, line 46, in forward
memory_lengths=lengths)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “openNMT/OpenNMT-py-master/onmt/decoders/transformer.py”, line 187, in forward
src_batch, src_len = src_words.size()
ValueError: too many values to unpack (expected 2)