Interpreting the gradients global norm in Tensorboard

Hello,
I tried the following, but I couldn’t understand what I am getting from the charts regarding the gradients global norm, it dropped so low but that didn’t reflect on BLEU results(I got a slight boost in BLEU), maybe I am doing something wrong?

I averaged the last two checkpoints and continued training with the averaged checkpoint, then I did the same thing over and over every ~10k steps:
(i.e) 10k + 20k > avg 20k training till 30k
avg 20k + 30k > avg 30k training till 40k
avg 30k + 40k > avg 40k training till 50k and so on

Here are the charts:

Hi,

Did you update OpenNMT-tf in the meantime?

Yes, I am using the latest OpenNMT-tf 2.17.0 on pip.

The latest version changed when the gradients are normalized. Now they are normalized before computing the global norm. It does not impact the training, but the norm reported in TensorBoard appears smaller.