What's the difference between model.zero_grad() and optim.zero_grad()

What’s the difference between model.zero_grad() and optim.zero_grad()?
It seems that we oftern use the second one.

That’s not specific to OpenNMT. See:

tl;dr: they are the same.

1 Like