Problems with learning rate decay

When I fix the -start_decay_steps 6084888 and -decay_steps 3042444 with -decay_method noam then I get this error:

RuntimeError: value cannot be converted to type float without overflow: (-7.65404e-27,1.25e-10)

in

/OpenNMT-py/onmt/utils/optimizers.py", line 281, in step
python3.7/site-packages/torch/optim/adam.py", line 107, in step
    p.data.addcdiv_(-step_size, exp_avg, denom)

I use Pytorch 1.0 and adam optimizer with learning rate 0.0001. Any idea?

First, these settings -start_decay_steps 6084888 -decay_steps 3042444 are not used in the Noam decay schedule.

What is your current training step?

It is -train_steps 3100000000.

The learning rate becomes very small when going that far in the training. Maybe you want to switch to a constant learning rate at some point?

1 Like

I faced the same error while train model:

Traceback (most recent call last):
    optimizer.step()
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/optim/adam.py", line 107, in step
    p.data.addcdiv_(-step_size, exp_avg, denom)
RuntimeError: value cannot be converted to type float without overflow: (3.52033e-08,-1.14383e-08)

Can someone give me idea?