Ok, it’s coherent. Each time ONMT is restated, it uses the last known LR: the one calculated AFTER the end of the last saved epoch. Then it is waiting for the next epoch with this LR to know if the validation PPL is increasing or not, to decide if it must decrease the LR.
Epoch 6 was restarted with model 5. At the end of epoch 5, the LR was decided to be lower, and saved in model 5. Thus, epoch 6 LR is lower than the one of epoch 5. Epoch 7 was restarted with the model 6. At the end of epoch 6, it was waiting for one more epoch to know about the next validation PPL, thus model 6 was saved with a not changing LR. It’s also the case at epoch 8, restated with model 7. At epoch 9, the validation PPL was decreasing, thus, again identical LR. At epoch 10, the validation PPL was increasing, causing the LR to decrease again.
Gray vertical lines are showing the restart epochs here: