What’s the difference between model.zero_grad() and optim.zero_grad()?
It seems that we oftern use the second one.
That’s not specific to OpenNMT. See:
tl;dr: they are the same.
1 Like
What’s the difference between model.zero_grad() and optim.zero_grad()?
It seems that we oftern use the second one.
That’s not specific to OpenNMT. See:
tl;dr: they are the same.