I am trying to use ConvS2S and getting this error: RuntimeError: value cannot be converted to type float without overflow (-7.65404e-23,1.25e-06)
How can I handle this issue? I know there was a similar discussion on here https://github.com/OpenNMT/OpenNMT-py/issues/491, but I don’t exactly get how to do ‘replacing ‘inf’ with 1e-18’ and if it’s right solution for my case. Thanks in advance!
error log:
[2019-07-21 13:29:53,829 INFO] Starting training on GPU: [0, 1]
[2019-07-21 13:29:53,829 INFO] Start training loop and validate every 100 steps…
[2019-07-21 13:29:55,800 INFO] Loading dataset from conv_syllable_underbar/.train.0.pt
[2019-07-21 13:29:55,975 INFO] number of examples: 23999
Traceback (most recent call last):
File “train.py”, line 200, in
main(opt)
File “train.py”, line 82, in main
p.join()
File “/usr/lib/python3.5/multiprocessing/process.py”, line 121, in join
res = self._popen.wait(timeout)
File “/usr/lib/python3.5/multiprocessing/popen_fork.py”, line 51, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File “/usr/lib/python3.5/multiprocessing/popen_fork.py”, line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
File “train.py”, line 184, in signal_handler
raise Exception(msg)
Exception:
– Tracebacks above this line can probably
be ignored –
Traceback (most recent call last):
File “/home/users/woody/transliteration/OpenNMT-py/train.py”, line 142, in run
single_main(opt, device_id, batch_queue, semaphore)
File “/home/users/woody/transliteration/OpenNMT-py/onmt/train_single.py”, line 143, in main
valid_steps=opt.valid_steps)
File “/home/users/woody/transliteration/OpenNMT-py/onmt/trainer.py”, line 243, in train
report_stats)
File “/home/users/woody/transliteration/OpenNMT-py/onmt/trainer.py”, line 409, in _gradient_accumulation
self.optim.step()
File “/home/users/woody/transliteration/OpenNMT-py/onmt/utils/optimizers.py”, line 340, in step
self.optimizer.step()
File “/home/users/woody/.local/lib/python3.5/site-packages/torch/optim/adam.py”, line 107, in step
p.data.addcdiv(-step_size, exp_avg, denom)
RuntimeError: value cannot be converted to type float without overflow: (-7.65404e-23,1.25e-06)
I have tested your command and got the same error, but when I removed all other options except the “cnn” model type that you want to use, the training managed to start.
Thanks for the reply yasmin! The code itself works now but…
Did you actually check out the final perplexity and accuracy got from the best model tho? It seems to me that the model isn’t trained at all. For example, with transformer it went through like 2500 steps and achieved 90% accuracy on validation set from the best model, but ConvS2S only went 50 steps, and accuracy is like 17%. (The training sets & validation sets of src & trn are all same of course)
You are welcome! So could you please mark that reply as “solution” in the case someone else has the same issue.
No, just made sure the original error was gone.
Do you mean with “early stopping”? If so, try to disable the option and see if accuracy is improved with more training steps.
The best way to find out the recommended options for a model (other than trying yourself) is to check the original paper. According to the OpenNMT-py code, the CNN decoder/encoder is an implementation of “Convolutional Sequence to Sequence Learning” paper. Although it mentions “machine translation” among the applications of the paper, it elaborates more on the “summarization” experiment. Still, there is much to learn from the paper.
If still in doubt, you can send a “new topic” asking for the best options of ConvS2S for machine translation; hopefully, other colleagues can help.