Strange learning rate behaviour

With the training below, I used the standard learning rate management (no special parameter). I stopped the training several times, and restarted it with the continue option. I wonder why there is a part with a constant learning rate, see the blue curve between epoch 6 and 9 ? Is this a normal behaviour ?

Here is the command line:

th train.lua -continue -gpuid 2 -train_from "$dataPath$model" -end_epoch 100 -src_word_vec_size 200,3 -tgt_word_vec_size 200,3 -layers 2 -rnn_size 1000 -max_batch_size 50 -data "$dataPath"onmt-fr-en-train.t7 -save_model "$dataPath"onmt-fr-en-modelSTD >> "$dataPath"LOG-fr-en-STD.txt 2>&1

We can count this as a bug. The initial learning rate decayed because the validation perplexity went up but this information is forgotten even with -continue.

Ok. I have to investigate more precisely with my restart times…

Ok, it’s coherent. Each time ONMT is restated, it uses the last known LR: the one calculated AFTER the end of the last saved epoch. Then it is waiting for the next epoch with this LR to know if the validation PPL is increasing or not, to decide if it must decrease the LR.

Epoch 6 was restarted with model 5. At the end of epoch 5, the LR was decided to be lower, and saved in model 5. Thus, epoch 6 LR is lower than the one of epoch 5. Epoch 7 was restarted with the model 6. At the end of epoch 6, it was waiting for one more epoch to know about the next validation PPL, thus model 6 was saved with a not changing LR. It’s also the case at epoch 8, restated with model 7. At epoch 9, the validation PPL was decreasing, thus, again identical LR. At epoch 10, the validation PPL was increasing, causing the LR to decrease again.

Gray vertical lines are showing the restart epochs here:

1 Like