Not sure that the number of unknown words is the best criterion. Even with several rare words a sentence could be very interesting, while sentences with only common words would be of no real interest.
Perhaps I would test something like:
randomly select a given number of sentences : 2M, 5M, … ?
do a fast train of a model with it.
do a fast translation, using beam_size=1, of all your corpus. This could be very time consuming if your data set is really huge. Perhaps you will have to also restrict to a randomly chosen set.