Translating is a hard task that implies to use a large network. Using a large network forces to use a small vocab, to avoid GRAM and training time explosions.
But, post-editing is certainly a softer task that can be done with a smaller network. A smaller network can use a large vocab, without trouble.
Process example with a 4 factor:
- train a first network to translate lang1 to lang2 with 1000 cells per layer and a 50k vocab
- with this model, translate all the training set (possibly, needs beam_size=1 to be done in a reasonable time on large sets), with unk replacement
- train a second network to post-edit the translated training set lang2 to lang2 with 250 cells per layer and a 200k vocab
It’s possible that the second network would learn to put the right words in the right positions depending on the first network errors and its unk replacements. The translation quality could be better, solving a large part of the small-vocab problem.