After the build vocab step
when I am starting NMT trainnig , getting the below error
Loading vocab from text file…
[2021-02-09 04:02:27,961 INFO] Loading src vocabulary from data/en-to-bn-aai4b-1.1/en-to-bn-aai4b-1.1.vocab.src
[2021-02-09 04:02:28,080 INFO] Loaded src vocab has 41327 tokens.
Traceback (most recent call last):
File “/home/ubuntu/python_virtual_env/nmt_training_env/bin/onmt_train”, line 8, in
sys.exit(main())
File “/home/ubuntu/python_virtual_env/nmt_training_env/lib/python3.6/site-packages/onmt/bin/train.py”, line 169, in main
train(opt)
File “/home/ubuntu/python_virtual_env/nmt_training_env/lib/python3.6/site-packages/onmt/bin/train.py”, line 103, in train
checkpoint, fields, transforms_cls = _init_train(opt)
File “/home/ubuntu/python_virtual_env/nmt_training_env/lib/python3.6/site-packages/onmt/bin/train.py”, line 80, in _init_train
fields, transforms_cls = prepare_fields_transforms(opt)
File “/home/ubuntu/python_virtual_env/nmt_training_env/lib/python3.6/site-packages/onmt/bin/train.py”, line 34, in prepare_fields_transforms
opt, src_specials=specials[‘src’], tgt_specials=specials[‘tgt’])
File “/home/ubuntu/python_virtual_env/nmt_training_env/lib/python3.6/site-packages/onmt/inputters/fields.py”, line 34, in build_dynamic_fields
min_freq=opts.src_words_min_frequency)
File “/home/ubuntu/python_virtual_env/nmt_training_env/lib/python3.6/site-packages/onmt/inputters/inputter.py”, line 309, in _load_vocab
for token, count in vocab:
ValueError: not enough values to unpack (expected 2, got 1)
Havent been able to figure it out why it is happening.
data:
corpus:
path_src: data/src_train.txt
path_tgt: data/tgt_train.txt
transforms: [sentencepiece,filtertoolong]
valid:
path_src: data/src_dev.txt
path_tgt: data/tgt_dev.txt
transforms: [sentencepiece,filtertoolong]
src_subword_model: model/sentencepiece_models/en_32000.model
tgt_subword_model: model/sentencepiece_models/ta_32000.model
src_seq_length: 200
tgt_seq_length: 200
skip_empty_level: silent
save_model: model/model_
save_checkpoint_steps: 10000
train_steps: 150000
valid_steps: 10000
tensorboard: true
tensorboard_log_dir: runs/onmt
world_size: 1
gpu_ranks: [0]
batch_type: “tokens”
batch_size: 4096
max_generator_batches: 2
accum_count: [2]
normalization: “tokens”
optim: “adam”
learning_rate: 0.25
adam_beta2: 0.998
decay_method: “noam”
warmup_steps: 8000
max_grad_norm: 0
param_init: 0
param_init_glorot: true
label_smoothing: 0.1
encoder_type: transformer
decoder_type: transformer
layers: 6
heads: 8
rnn_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout: [0.1]
position_encoding: true