I’ve tried to set
max_grad_norm with different numbers. According to the documentation:
Clip the gradients L2-norm to this value. Set to 0 to disable.
However, I think what I’m doing is lack of reasoning.
Would it be a great strategy if I list every gradient and get something like average while training? And I also wonder if there’s any way to get each gradient in training phase.