Automatic post-edit training of a training

Etienne38 · June 24, 2017, 12:10pm

Translating is a hard task that implies to use a large network. Using a large network forces to use a small vocab, to avoid GRAM and training time explosions.

But, post-editing is certainly a softer task that can be done with a smaller network. A smaller network can use a large vocab, without trouble.

Process example with a 4 factor:

train a first network to translate lang1 to lang2 with 1000 cells per layer and a 50k vocab
with this model, translate all the training set (possibly, needs beam_size=1 to be done in a reasonable time on large sets), with unk replacement
train a second network to post-edit the translated training set lang2 to lang2 with 250 cells per layer and a 200k vocab

It’s possible that the second network would learn to put the right words in the right positions depending on the first network errors and its unk replacements. The translation quality could be better, solving a large part of the small-vocab problem.

tel34 · June 24, 2017, 4:57pm

That looks interesting. I had also been thinking about ways to use NMT for post-editing tasks but haven’t got round to doing anything yet.

Etienne38 · June 24, 2017, 5:21pm

Ok, but, here, my purpose isn’t to train a real post-edit model, in a common sens. It’s a complementary (automatic) training, in 2 stages, using only the training set, and 2 ways of balancing the neural net constraints.

PS : I see that I wasn’t so clear. I changed the title. Hope it would avoid confusion.

Etienne38 · June 28, 2017, 8:53am

My first test is a bit deceiving due to the fact I’m using beam-size=1 to get a fast translation in step 2. The second model is enhancing the translation a bit, but starting from a lower quality, it’s still under the quality obtained with only the first model with default beam-size.

Now retrying with default beam_size…

wiktor.stribizew · June 14, 2018, 12:05pm

Is there any progress with this experiment? I wonder if I understood the approach correctly:

Train a normal model, say, EN>XX1
Translate the original corpus with the resulting model from Step 1 to obtain EN>XX2
Train a supplementary model from XX2 > XX1
When translating from EN, use the first model to obtain a usual translation, and then the second model to get a “post-edited” translation.

Right?

Etienne38 · June 14, 2018, 12:31pm

Yes. The main goal was to try balancing the network size vs vocab size: the first model is supposed to learn complex sentences using a large network and a small vocab, while the second one is supposed to fix vocab troubles having very few things to change in the formulation using a small network and a large vocab.
I should perhaps try again such experiments, improving it with one year of know-how added from this date.