Why NLL loss is not normalized by the number of tokens?

I noticed that a lot of NMT implementations (including ONMT-Py) do not normalize loss by the number of tokens (nor batch size).
Is there some specific reason for this?

line of code from the repository:

check here: