How quality of validation dataset affects final results of a model with a fixed number of steps?


I began to wonder whether the quality of validation data somehow affects final results (model’s translation quality). If I train a model for example 55 000 steps and then average results on the last 10 checkpoints will I get different results if I’ll use completely different validation data (let’s ignore other factors here)?
It seems that using validation data taken randomly and deleted from the training dataset and validation data prepared manually (balance by topics) doesn’t matter at all considering, that the training process always goes 55 000 steps and averages by 10 checkpoints.


The validation set in itself does not impact the result.

However, if the validation data is extracted from the training data then the training data is different which can have an impact on the final result. In practice the validation data is very small compared to the training data so the impact should be negligible.

1 Like

Thanks a lot!