Reproducing "Neural Machine Translation from Simplified Translations"


I want to reproduce the experiment described in “Neural Machine Translation from Simplified Translations” to see if it works for me. I want to ask a few questions that will help me avoid doing mistakes.

If I understood correctly, the steps for training English-German would be

  1. to train a teacher model on ENG-GER bitext corpus
  2. to translate the training ENG-GER corpus and generate ENG-sGER (here sGER means simplified German)
    3a. to train a student model on ENG-sGER data exclusively
    or 3b. to train a student model on combination of original ENG-GER and generated ENG-sGER

Is my understanding correct?

What if ENG-sGER contains UNK tokens? are such cases treated especially: perhaps removed?

The paper reports that the model in 3a gives +0.51 BLEU. Is this +0.51 on the un-simplified data?

Did anyone review translations that are different between the teacher model and the student model? What were the impressions?


@josep.crego - can you follow-up on that question?

1 Like

Hell Nikolai,

The steps we followed to train a student en2de system:
1 - Train the teacher system en2de on reference translations (training bitext)
2 - produce a simplified German version (de’) of the training bitext using the previous en2de teacher model
3 - Train the student system en2de’ over the original training English (en) and simplified German (de’) produced in step 2

let me know if you have further questions