It would be useful to assist training with feedback from humans on appropriate responses, creating a negative or positive signal from human-reviewed translations. Is this in scope for OpenNMT?
We do plan on implementing reinforcement learning style objectives and default baselines, as well as various types of memory network support. Data collection and a front end for that type of project is beyond the scope though. Let us know if you have ideas for an RL style api.
Not sure about internal API, but perhaps accepting an additional training file with the same line count with the format “0” or “1” on each line.
Where 1 is an accepted translation of the corresponding input line and 0 otherwise, with empty line meaning no supervision signal for the input/output pair.