Translating is a hard task that implies to use a large network. Using a large network forces to use a small vocab, to avoid GRAM and training time explosions.
But, post-editing is certainly a softer task that can be done with a smaller network. A smaller network can use a large vocab, without trouble.
Process example with a 4 factor:
train a first network to translate lang1 to lang2 with 1000 cells per layer and a 50k vocab
with this model, translate all the training set (possibly, needs beam_size=1 to be done in a reasonable time on large sets), with unk replacement
train a second network to post-edit the translated training set lang2 to lang2 with 250 cells per layer and a 200k vocab
It’s possible that the second network would learn to put the right words in the right positions depending on the first network errors and its unk replacements. The translation quality could be better, solving a large part of the small-vocab problem.
Ok, but, here, my purpose isn’t to train a real post-edit model, in a common sens. It’s a complementary (automatic) training, in 2 stages, using only the training set, and 2 ways of balancing the neural net constraints.
PS : I see that I wasn’t so clear. I changed the title. Hope it would avoid confusion.
My first test is a bit deceiving due to the fact I’m using beam-size=1 to get a fast translation in step 2. The second model is enhancing the translation a bit, but starting from a lower quality, it’s still under the quality obtained with only the first model with default beam-size.
Yes. The main goal was to try balancing the network size vs vocab size: the first model is supposed to learn complex sentences using a large network and a small vocab, while the second one is supposed to fix vocab troubles having very few things to change in the formulation using a small network and a large vocab.
I should perhaps try again such experiments, improving it with one year of know-how added from this date.