Guitaricet
(Vlad Lyalin)
September 22, 2019, 12:39am
1
I noticed that a lot of NMT implementations (including ONMT-Py) do not normalize loss by the number of tokens (nor batch size).
Is there some specific reason for this?
line of code from the repository:
len(tgt_field.vocab), opt.copy_attn_force,
unk_index=unk_idx, ignore_index=padding_idx
)
elif opt.label_smoothing > 0 and train:
criterion = LabelSmoothingLoss(
opt.label_smoothing, len(tgt_field.vocab), ignore_index=padding_idx
)
elif isinstance(model.generator[-1], LogSparsemax):
criterion = SparsemaxLoss(ignore_index=padding_idx, reduction='sum')
else:
criterion = nn.NLLLoss(ignore_index=padding_idx, reduction='sum')
# if the loss function operates on vectors of raw logits instead of
# probabilities, only the first part of the generator needs to be
# passed to the NMTLossCompute. At the moment, the only supported
# loss function of this kind is the sparsemax loss.
use_raw_logits = isinstance(criterion, SparsemaxLoss)
loss_gen = model.generator[0] if use_raw_logits else model.generator
if opt.copy_attn:
compute = onmt.modules.CopyGeneratorLossCompute(
criterion, loss_gen, tgt_field.vocab, opt.copy_loss_by_seqlength,
vince62s
(Vincent Nguyen)
September 22, 2019, 8:20am
2
check here:
if shard_size == 0:
loss, stats = self._compute_loss(batch, **shard_state)
return loss / float(normalization), stats
batch_stats = onmt.utils.Statistics()
for shard in shards(shard_state, shard_size):
loss, stats = self._compute_loss(batch, **shard)
loss.div(float(normalization)).backward()
batch_stats.update(stats)
return None, batch_stats