Learn generalized pattern with OpenNMT

kaihuchen · March 25, 2018, 9:04pm

I have been using OpenNMT-tf to translate from synthesized English phrases to a simple form of knowledge representation (KR). I want to use OpenNMT to learn generalized pattern that would work even for words not in the training set (but are in the overall vocab), and I am looking for advice as to how to make it happen.

For example, a training sample has the following form:

Source English: John gave the ball to Mary
Target KR: PTRANS ball John Mary

The target KR above is meant to indicate that there is the physical transfer of a ball from John to Mary. Similarly, the following indicates that the transfer happens in the opposite direction.

Source English: John took the ball from Mary
Target KR: PTRANS ball Mary John

OpenNMT performs great with a set of synthesized training dataset like the above, with random combinations of many names, actions, and objects. However, if a test sample contains names that OpenNMT has never observed in the training dataset, then the test always fails. For example, if the test English phrase is:

Test Source sample: John gave the ball to Jess

Where the name ‘Jess’ is in the vocab but never appeared in the training dataset, then running it through a trained OpenNMT will yield a prediction something like:

Predicted KR: PTRANS ball John Dan

Where the ‘Dan’ part varies but always the wrong answer. The correct answer should be:

Correct answer: PTRANS ball John Jess

For my purposes this is a big problem, since limiting my system to function only with names visible during training is not acceptable.

I understand that OpenNMT is not meant for performing the kind of generalization that can infer higher-level pattern from ground facts, and also that such limitation is not specific to OpenNMT. What I am looking for here is a way to get OpenNMT to behave closer to what I wanted as described above

Any advice is appreciated.

guillaumekln · March 28, 2018, 2:55pm

Did you already explore this task as a data preparation problem? E.g. replacing named entities with placeholders or even replacing some words by their POS tag?

kaihuchen · March 28, 2018, 5:14pm

@guillaumekln Yes, I have. Adding something like POS tags to the training data does help in some other ways (e.g., getting better test result when the text is more complex, such as including adjectives, etc.), but it makes no difference at all for the problem mentioned above.

guillaumekln · March 29, 2018, 7:35am

Maybe you could rely on an external NER tool to replace entities, e.g from:

John took the ball from Mary

to:

__ent_person_1 took the ball from __ent_person_2

It should be a first step in solving this issue.

kaihuchen · March 29, 2018, 3:25pm

@guillaumekln I have already tried that as well, and it did not help at all. This was the case whether I just replace names with the NER entity tokens, or having the NER token version as a separate stream of input in parallel with the plain English stream for training.

I wonder if this is because at inference time the OpenNMT (or perhaps any similar NN systems) simply cannot account for any tokens not seen during training, thus it will simply go with the probability distributions learned for other tokens and then appears to be almost making random predictions for those unseen names.

One workaround that I have is as follows:

Include some special NER tokens in the training data, e.g. like the __ent_person_1 and __ent_person_2 tokens mentioned above.
At prediction time,
a. First replace the names with NER tokens (i.e., changing “John took the ball from Jess” to “__ent_person_1 took the ball from __ent_person_2”)
b. Make the prediction to get “PTRANS ball __ent_person_1 __ent_person_2”,
c. Then replace the NER tokens with the original names to get “PTRANS ball John Jess”

This way I am effectively adding external logic outside of OpenNMT to give the NER tokens special meanings. Kind of works but a little messy, so I am still looking for a better solution.

guillaumekln · March 29, 2018, 3:32pm

The workaround you mentioned is what I was referring precisely. I think it very common to preprocess and postprecocess the segments this way.

kaihuchen · March 29, 2018, 4:22pm

@guillaumekln Understood. Much appreciated for your input!

I would love to see that one day OpenNMT will have a built-in feature for auto-generating useful tokens which basically achieves a sort of higher-level generalization capability. For example, if the system notices that those ‘name tokens’ behaves similarly under the same context, then a new special token can then be created to represent a category that covers those names. The system can then be trained further to verify that those new tokens are indeed useful new categories.

I believe that having this kind of generalization power will make OpenNMT immensely more powerful.

kaihuchen · April 1, 2018, 3:45pm

@guillaumekln To proceed with the approach mentioned above, I am going to add external logic to alter the input data, which basically is a way of generalizing the data (e.g., by changing names like Jess to PROPER_NOUN, and perhaps also similar changes based on other POS tags).

Since my system will need all of the good predictions and not just one, I think I am going to end up with having multiple OpenNMT instances, each learning their own type of predictions (e.g., one learn to predict PTRANS ball John Mary, another learn to predict PTRANS ball PROPER_NOUN_1 PROPERNOUN_2, etc.

My question here is that: given multiple predictions like this, is there anyway that I can get a OpenNMT metric on the degree of reliability for each prediction, so that I am able to decide which one to give more trust?

guillaumekln · April 3, 2018, 9:16am

Can you elaborate on why the output “PTRANS ball PROPER_NOUN_1 PROPERNOUN_2” is not enough for your use case? You should be able to copy the original source words in your external logic.

kaihuchen · April 9, 2018, 12:42am

To clarify, the external logic solution mentioned above does work and is sufficient for the case that I described in my original post above.

However, I want to expand such an operation to do many other types of token replacements, such as using various NLP tools to replace John with john.n.01, or replacing Jess with NAME_OF_FEMALE_PERSON, etc.
Doing so would mean that I will end up with many instances of OpenNMT, each making its own prediction regarding a different type of token with varying degree of certainty. As such I am looking for a way to evaluate how good each prediction is, and I am wondering if there is something that I can pull out of OpenNMT regarding the certainty of each prediction.