I want to reproduce the experiment described in “Neural Machine Translation from Simplified Translations” https://arxiv.org/abs/1612.06139 to see if it works for me. I want to ask a few questions that will help me avoid doing mistakes.
If I understood correctly, the steps for training English-German would be
- to train a teacher model on ENG-GER bitext corpus
- to translate the training ENG-GER corpus and generate ENG-sGER (here sGER means simplified German)
3a. to train a student model on ENG-sGER data exclusively
or 3b. to train a student model on combination of original ENG-GER and generated ENG-sGER
Is my understanding correct?
What if ENG-sGER contains UNK tokens? are such cases treated especially: perhaps removed?
The paper reports that the model in 3a gives +0.51 BLEU. Is this +0.51 on the un-simplified data?
Did anyone review translations that are different between the teacher model and the student model? What were the impressions?