If I understand Sequence-Level Knowledge Distillation correctly, the idea there is to 1) generate simplified translations using a "teacher" network and then 2) train another (student) model with the simplified translations as the target sentences.
My use case is different. I want to tag all words in a sentence as
location. During training I'll have labels like:
John painted a blue sky
none, 'a: none', 'blue': color,
During prediction I have partially labelled sentences ( say the label for color is available but not for person etc.)
Input sentence during prediction:
Jane's dress is brown
Labels available :
Jane's: not available, dress: not available, is: not available, brown: color
During beam search I want to force the label for
brown to be
color since I know the label for that word.
Note: this kind of partially labeled data arises frequently in practice atleast for tagging tasks. For MT too, if you know part of the translation from human annotators with high confidence and are trying to improve upon it this feature might be useful.
- Is the use case clear enough from my description?
- Is this kind of feature planned?
- If not can you give me pointers to the right files that should be modified?