I am training multi-lingual NMT for English to Indian languages on WAT 2021 Indic multi-lingual shared task data (). I got the following error after 800 training steps:
Traceback (most recent call last):
File “/home/ram/anaconda3/bin/onmt_train”, line 33, in
sys.exit(load_entry_point(‘OpenNMT-py==2.0.0rc2’, ‘console_scripts’, ‘onmt_train’)())
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/bin/train.py”, line 169, in mai
n
train(opt)
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/bin/train.py”, line 148, in tra
in
p.join()
File “/home/ram/anaconda3/lib/python3.8/multiprocessing/process.py”, line 149, in join
res = self._popen.wait(timeout)
File “/home/ram/anaconda3/lib/python3.8/multiprocessing/popen_fork.py”, line 47, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File “/home/ram/anaconda3/lib/python3.8/multiprocessing/popen_fork.py”, line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/utils/distributed.py”, line 162
, in signal_handler
raise Exception(msg)
Exception:
– Tracebacks above this line can probably
Traceback (most recent call last):
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/utils/distributed.py”, line 208, in consumer
process_fn(opt, device_id=device_id,
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/train_single.py”, line 102, in main
trainer.train(
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/trainer.py”, line 242, in train
self._gradient_accumulation(
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/trainer.py”, line 366, in _gradient_accumulation
outputs, attns = self.model(
File “/home/ram/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/models/model.py”, line 45, in forward
enc_state, memory_bank, lengths = self.encoder(src, lengths)
File “/home/ram/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/models/model.py”, line 45, in forward
enc_state, memory_bank, lengths = self.encoder(src, lengths)
File “/home/ram/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/encoders/transformer.py”, line 121, in forward
emb = self.embeddings(src)
File “/home/ram/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/modules/embeddings.py”, line 244, in forward
source = module(source, step=step)
File “/home/ram/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/ram/anaconda3/lib/python3.8/site-packages/OpenNMT_py-2.0.0rc2-py3.8.egg/onmt/modules/embeddings.py”, line 52, in forward
emb = emb + self.pe[:emb.size(0)]
RuntimeError: The size of tensor a (6021) must match the size of tensor b (5000) at non-singleton dimension 0