Here is a extremely simplified sample. Suppose your are trying to learn a linear model over a single learning blue point, having a single red validation point. It's the usual case of deep neural networks : you have too much freedom degrees for the number of samples you have. You have an infinite number of perfect solutions, overfitting your single point, doing whatever they want on the validation red point.
So, now, have a look at the training curve, for epoch E1 to E6. On all epochs, the learning error is decreasing, while the model is much more efficient on the learning blue point. From E1 to E3, the validation value is decreasing on the red point. On E4, it's increasing. On E5 it's even an exact validation match. The final E6 is again a decreasing.
This validation evaluation, during the training, has no interest, because you don't know what will be the next model in the next epoch, and you don't know where are the other red points to be sure that your validation set is really pertinent for the whole curve you are searching.
The only interesting consideration is that final errors done on the validation points, when convergence are reached on many tested network structures, is certainly a good evaluation of the way your model have a size for a good optimization of the bias-variance tradoff.