External reward signal for human-in-the-loop training


(Alex Graveley) #1

It would be useful to assist training with feedback from humans on appropriate responses, creating a negative or positive signal from human-reviewed translations. Is this in scope for OpenNMT?

Some background:
Dialog Learning with Human-in-the-Loop - https://arxiv.org/pdf/1611.09823v2.pdf
Dialog-based Language Learning - https://arxiv.org/pdf/1604.06045v7.pdf


(srush) #2

We do plan on implementing reinforcement learning style objectives and default baselines, as well as various types of memory network support. Data collection and a front end for that type of project is beyond the scope though. Let us know if you have ideas for an RL style api.


(Alex Graveley) #3

Not sure about internal API, but perhaps accepting an additional training file with the same line count with the format “0” or “1” on each line.

Where 1 is an accepted translation of the corresponding input line and 0 otherwise, with empty line meaning no supervision signal for the input/output pair.