Correct settings when source word features

Hi, I am using the tool opennmt-py to create my machine translation model. I would like to use some annotations in the source language and I found that the Word features function of opennmt-py might be able to meet my requirements. But I’m having some problems, and I think maybe I’m not using it in the right way. Here are my configurations.
My training data looks like:
Source’: they│0 are│0 not│0 even│0 100│0 metres│0 apart│0 :│0 on│0 Tuesday│0 ,│0 the│0 new│0 B│1 33│1 Bundesstraße│2 33│2 pedestrian│0 lights│0 in│0 Dorfparkplatz│0 in│0 Gutach│1 Gutach│2 (Schwarzwaldbahn)│2 became│0 operational│0 -│0 within│0 view│0 of│0 the│0 existing│0 Town│1 Hall│1 Bremer│2 Rathaus│2 traffic│0 lights│0 .│0

Target:Sie stehen keine 100 Meter voneinander entfernt : am Dienstag ist in Gutach die neue B 33 @-@ Fußgängerampel am Dorfparkplatz in Betrieb genommen worden - in Sichtweite der älteren Rathausampel .

I first trained the sentencepiece on the unannotated data and then used build_vocab.py to build the vocabulary with config:

save_data: /raid/lyu/en_de/wmt2017ss
overwrite: True

vocab:

src_vocab: /raid/lyu/en_de/wmt2017ss/spm1.vocab
tgt_vocab: /raid/lyu/en_de/wmt2017ss/spm1.vocab
src_vocab_size: 41000
tgt_vocab_size: 41000
vocab_size_multiple: 8
src_words_min_frequency: 2
tgt_words_min_frequency: 2
share_vocab: True
n_sample: 0
src_subword_model: /raid/lyu/en_de/wmt2017/sentenpiece/spm.model
tgt_subword_model: /raid/lyu/en_de/wmt2017/sentenpiece/spm.model

Filter

src_seq_length: 96
tgt_seq_length: 96

Corpus opts:

data:
corpus_1:
path_src: /raid/lyu/en_de/wmt2017ss/train.src
path_tgt: /raid/lyu/en_de/wmt2017ss/train.trg
transforms: [sentencepiece,inferfeats]
weight: 1
valid:
path_src: /raid/lyu/en_de/wmt2017ss/dev.src
path_tgt: /raid/lyu/en_de/wmt2017ss/dev.trg
transforms: [sentencepiece,inferfeats]

n_src_feats: 1
feat_vec_exponent: 1
src_feats_defaults: “0”

Then I run train.py with the following config:

Corpus opts:

data:
corpus_1:
path_src: /raid/lyu/en_de/wmt2017ss/train.src
path_tgt: /raid/lyu/en_de/wmt2017ss/train.trg
transforms: [sentencepiece,inferfeats]
weight: 1
valid:
path_src: /raid/lyu/en_de/wmt2017ss/dev.src
path_tgt: /raid/lyu/en_de/wmt2017ss/dev.trg
transforms: [sentencepiece,inferfeats]

n_src_feats: 1
feat_vec_exponent: 1
src_feats_defaults: “0”
feat_merge: “concat”

Model configuration

save_model: /raid/lyu/en_de/wmt2017ss/bigwmt17
reversible_tokenization: “spacer”
log_file: /raid/lyu/en_de/wmt2017ss/logs.txt
keep_checkpoint: 50
save_checkpoint_steps: 5000
average_decay: 0
seed: 1
report_every: 1000
train_steps: 50000
valid_steps: 1000
train_eval_steps: 1000
eval_eval_steps: 1000
train_metrics: [BLEU, TER]
valid_metrics: [BLEU, TER]
bucket_size: 262144
num_workers: 2
prefetch_factor: 400
world_size: 2
gpu_ranks: [0,1]
batch_type: “tokens”
batch_size: 2500
valid_batch_size: 4096
batch_size_multiple: 8
accum_count: [10]
accum_steps: [0]
model_dtype: “fp16”
apex_opt_level: “O2”
optim: “adam”
learning_rate: 2
warmup_steps: 4000
decay_method: “noam”
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: “tokens”

encoder_type: transformer
decoder_type: transformer
enc_layers: 6
dec_layers: 6
heads: 16
hidden_size: 1024
word_vec_size: 1016
transformer_ff: 4096
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]
share_decoder_embeddings: true
share_embeddings: true
position_encoding: true

But I got the error:

[2023-06-20 12:57:31,712 INFO] Starting process pid: 349220
[2023-06-20 12:57:31,738 INFO] Starting process pid: 349221
[2023-06-20 12:57:33,314 INFO] Parsed 2 corpora from -data.
[2023-06-20 12:57:33,315 INFO] Get special vocabs from Transforms: {‘src’: [], ‘tgt’: []}.
[2023-06-20 12:57:33,442 INFO] The first 10 tokens of the vocabs are:[‘’, ‘’, ‘’, ‘’, ‘▁,’, ‘▁.’, ‘▁the’, ‘▁in’, ‘▁die’, ‘▁of’]
[2023-06-20 12:57:33,443 INFO] The decoder start token is:
[2023-06-20 12:57:33,443 INFO] Building model…
[2023-06-20 12:57:37,053 INFO] NMTModel(
(encoder): TransformerEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(41000, 1016, padding_idx=1)
(1): Embedding(8, 8, padding_idx=1)
)
(pe): PositionalEncoding()
)
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(2): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(3): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(4): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(5): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
)
(decoder): TransformerDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(41000, 1016, padding_idx=1)
)
(pe): PositionalEncoding()
)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(transformer_layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm_1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.1, inplace=False)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(layer_norm_2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm_1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.1, inplace=False)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(layer_norm_2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm_1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.1, inplace=False)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(layer_norm_2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm_1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.1, inplace=False)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(layer_norm_2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm_1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.1, inplace=False)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(layer_norm_2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=1024, out_features=4096, bias=True)
(w_2): Linear(in_features=4096, out_features=1024, bias=True)
(layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.1, inplace=False)
(dropout_2): Dropout(p=0.1, inplace=False)
)
(layer_norm_1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.1, inplace=False)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=1024, out_features=1024, bias=False)
(linear_values): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=False)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.1, inplace=False)
(final_linear): Linear(in_features=1024, out_features=1024, bias=False)
)
(layer_norm_2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
)
)
)
(generator): Linear(in_features=1024, out_features=41000, bias=True)
)
[2023-06-20 12:57:37,056 INFO] encoder: 117210880
[2023-06-20 12:57:37,056 INFO] decoder: 100773928
[2023-06-20 12:57:37,056 INFO] * number of parameters: 217984808
[2023-06-20 12:57:37,056 INFO] * src vocab size = 41000
[2023-06-20 12:57:37,056 INFO] * tgt vocab size = 41000
[2023-06-20 12:57:37,056 INFO] * src_feat 0 vocab size = 8
[2023-06-20 12:57:37,154 INFO] Starting training on GPU: [0, 1]
[2023-06-20 12:57:37,154 INFO] Start training loop and validate every 1000 steps…
[2023-06-20 12:57:37,155 INFO] Scoring with: TransformPipe(SentencePieceTransform(share_vocab=True, src_subword_model=/raid/lyu/en_de/wmt2017/sentenpiece/spm.model, tgt_subword_model=/raid/lyu/en_de/wmt2017/sentenpiece/spm.model, src_subword_alpha=0, tgt_subword_alpha=0, src_subword_vocab=, tgt_subword_vocab=, src_vocab_threshold=0, tgt_vocab_threshold=0, src_subword_nbest=1, tgt_subword_nbest=1), InferFeatsTransform())
[2023-06-20 12:57:38,759 INFO] Weighted corpora loaded so far:
* corpus_1: 1
[2023-06-20 12:57:38,759 INFO] Weighted corpora loaded so far:
* corpus_1: 1
[2023-06-20 12:57:40,179 INFO] Weighted corpora loaded so far:
* corpus_1: 1
[2023-06-20 12:57:40,179 INFO] Weighted corpora loaded so far:
* corpus_1: 1
Traceback (most recent call last):
File “/raid/lyu/OpenNMT-py/onmt/trainer.py”, line 503, in _gradient_accumulation
model_out, attns = self.model(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/models/model.py”, line 69, in forward
dec_out, attns = self.decoder(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 481, in forward
dec_out, attn, attn_align = layer(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 111, in forward
layer_out, attns = self._forward(*args, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 268, in _forward
layer_in_norm = self.layer_norm_1(layer_in)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(input, **kwargs)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/normalization.py”, line 190, in forward
return F.layer_norm(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/functional.py”, line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[1024], expected input with shape [
, 1024], but got input of size[112, 21, 1016]
Traceback (most recent call last):
File “/raid/lyu/OpenNMT-py/onmt/trainer.py”, line 503, in _gradient_accumulation
model_out, attns = self.model(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/models/model.py”, line 69, in forward
dec_out, attns = self.decoder(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 481, in forward
dec_out, attn, attn_align = layer(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 111, in forward
layer_out, attns = self._forward(*args, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 268, in _forward
layer_in_norm = self.layer_norm_1(layer_in)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(input, **kwargs)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/normalization.py”, line 190, in forward
return F.layer_norm(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/functional.py”, line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[1024], expected input with shape [
, 1024], but got input of size[112, 21, 1016]
Traceback (most recent call last):
File “/raid/lyu/OpenNMT-py/onmt/bin/train.py”, line 71, in
Exception ignored in: <function _MultiProcessingDataLoaderIter.del at 0x7fe6356c2830>
Traceback (most recent call last):
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 1466, in del
main()
File “/raid/lyu/OpenNMT-py/onmt/bin/train.py”, line 67, in main
train(opt)
File “/raid/lyu/OpenNMT-py/onmt/bin/train.py”, line 49, in train
p.join()
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/process.py”, line 149, in join
res = self._popen.wait(timeout)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/popen_fork.py”, line 43, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/popen_fork.py”, line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
File “/raid/lyu/OpenNMT-py/onmt/utils/distributed.py”, line 159, in signal_handler
raise Exception(msg)
Exception:

– Tracebacks above this line can probably
be ignored –

Traceback (most recent call last):
File “/raid/lyu/OpenNMT-py/onmt/utils/distributed.py”, line 171, in consumer
process_fn(opt, device_id=device_id)
File “/raid/lyu/OpenNMT-py/onmt/train_single.py”, line 227, in main
trainer.train(
File “/raid/lyu/OpenNMT-py/onmt/trainer.py”, line 318, in train
self._gradient_accumulation(
File “/raid/lyu/OpenNMT-py/onmt/trainer.py”, line 567, in _gradient_accumulation
raise exc
File “/raid/lyu/OpenNMT-py/onmt/trainer.py”, line 503, in _gradient_accumulation
model_out, attns = self.model(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/models/model.py”, line 69, in forward
dec_out, attns = self.decoder(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 481, in forward
dec_out, attn, attn_align = layer(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 111, in forward
layer_out, attns = self._forward(*args, **kwargs)
File “/raid/lyu/OpenNMT-py/onmt/decoders/transformer.py”, line 268, in _forward
layer_in_norm = self.layer_norm_1(layer_in)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(input, **kwargs)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/modules/normalization.py”, line 190, in forward
return F.layer_norm(
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/nn/functional.py”, line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[1024], expected input with shape [
, 1024], but got input of size[112, 21, 1016]

self._shutdown_workers()

File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 1430, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/process.py”, line 149, in join
res = self._popen.wait(timeout)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/popen_fork.py”, line 40, in wait
if not wait([self.sentinel], timeout):
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/connection.py”, line 931, in wait
Process SpawnProcess-1:
Traceback (most recent call last):
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/process.py”, line 314, in _bootstrap
self.run()
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
KeyboardInterrupt
ready = selector.select(timeout)
File “/raid_elmo/home/lr/lyu/conda_env/apex/lib/python3.10/selectors.py”, line 416, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt:

I’m new to opennmt and I’m sorry I don’t really know how to configure it properly. According to the error message, it looks like the decoder input does not contain additional word feature input, resulting in a mismatch with the dimension of the hidden layer.