Interpreting the gradients global norm in Tensorboard

Nart · March 17, 2021, 12:20pm

Hello,
I tried the following, but I couldn’t understand what I am getting from the charts regarding the gradients global norm, it dropped so low but that didn’t reflect on BLEU results(I got a slight boost in BLEU), maybe I am doing something wrong?

I averaged the last two checkpoints and continued training with the averaged checkpoint, then I did the same thing over and over every ~10k steps:
(i.e) 10k + 20k > avg 20k training till 30k
avg 20k + 30k > avg 30k training till 40k
avg 30k + 40k > avg 40k training till 50k and so on

Here are the charts:

guillaumekln · March 17, 2021, 12:54pm

Hi,

Did you update OpenNMT-tf in the meantime?

Nart · March 18, 2021, 9:33am

Yes, I am using the latest OpenNMT-tf 2.17.0 on pip.

guillaumekln · March 18, 2021, 9:42am

The latest version changed when the gradients are normalized. Now they are normalized before computing the global norm. It does not impact the training, but the norm reported in TensorBoard appears smaller.