the best test set will depend on the kind of data you used to train your model.
For instance, if you used data from a newswire media it will be a good choice to use the
newscommentary test and dev sets to evaluate your model.
If you used biomedical data to train, there are some biomedical test/dev sets available as well.
There exist several different corpora depending on the domain you want to work into .
Typically, you can find those train, dev and test sets from the wmt shared translation tasks
(here you can see the last one: wmt2017 )
Also, you can find more corpora in the opus website.