I began to wonder whether the quality of validation data somehow affects final results (model’s translation quality). If I train a model for example 55 000 steps and then average results on the last 10 checkpoints will I get different results if I’ll use completely different validation data (let’s ignore other factors here)?
It seems that using validation data taken randomly and deleted from the training dataset and validation data prepared manually (balance by topics) doesn’t matter at all considering, that the training process always goes 55 000 steps and averages by 10 checkpoints.