I trained an NMT model by using a GRU encoder and GRU decoder.
it works but according to the log in tensorboard it seems weird.
for instance, the recorded loss illustrates the distance from softmaxied output distribution to the embedded target. during training this value lingers around the anchor value of 100000 (per sentence) but the validated score of ppl and accuracy are keeping decreacing and creasing.
is the compute method of validate loss diferent from the training loss? why the training loss lingering while ppl still decreacing?