OpenNMT Forum

External reward signal for human-in-the-loop training

It would be useful to assist training with feedback from humans on appropriate responses, creating a negative or positive signal from human-reviewed translations. Is this in scope for OpenNMT?

Some background:
Dialog Learning with Human-in-the-Loop - https://arxiv.org/pdf/1611.09823v2.pdf
Dialog-based Language Learning - https://arxiv.org/pdf/1604.06045v7.pdf

We do plan on implementing reinforcement learning style objectives and default baselines, as well as various types of memory network support. Data collection and a front end for that type of project is beyond the scope though. Let us know if you have ideas for an RL style api.

Not sure about internal API, but perhaps accepting an additional training file with the same line count with the format “0” or “1” on each line.

Where 1 is an accepted translation of the corresponding input line and 0 otherwise, with empty line meaning no supervision signal for the input/output pair.