External reward signal for human-in-the-loop training

orph · January 1, 2017, 11:40pm

It would be useful to assist training with feedback from humans on appropriate responses, creating a negative or positive signal from human-reviewed translations. Is this in scope for OpenNMT?

Some background:
Dialog Learning with Human-in-the-Loop - https://arxiv.org/pdf/1611.09823v2.pdf
Dialog-based Language Learning - https://arxiv.org/pdf/1604.06045v7.pdf

srush · January 2, 2017, 4:24pm

We do plan on implementing reinforcement learning style objectives and default baselines, as well as various types of memory network support. Data collection and a front end for that type of project is beyond the scope though. Let us know if you have ideas for an RL style api.

orph · January 20, 2017, 4:52am

Not sure about internal API, but perhaps accepting an additional training file with the same line count with the format “0” or “1” on each line.

Where 1 is an accepted translation of the corresponding input line and 0 otherwise, with empty line meaning no supervision signal for the input/output pair.