Catastrophic Forgetting after Domain Adaption - Ideas to solve

mayub · March 25, 2019, 4:02am

Hi,

I have used OpenNMT-tf to come up with a decent English -Spanish model that is first trained on general model (like EuroParl etc) and then Domain Adapted on my custom data set. The model works excellent for my domain data however it fails miserably now to translate general domain English sentences like “I ate an apple today”. Didn’t realize this is an actual phenomenon in neural networks called as Catastrophic Forgetting and some of the solutions really didn’t seem like they have been implemented as of yet. The simple solutions I have come up with for now are:

just mix and match some of the training data from general texts while your doing Domain Adaption so your model doesn’t forget too much about the previous training.
Have two models (one that is for general texts and another for your domain texts). Use a pre-built text classifier model (something like BERT etc) that detects if the entered text is from a general domain or specific domain. Then call the general or specific En-Es model based on the classifier output.

I’m interested to know from folks who have tried Domain adaption on how they have solved this using OpenNMT otherwise. Does any of the above solution sound realistic. I realize that there is no golden rule or solution to it, open to any suggestions.

Thanks !

guillaumekln · March 25, 2019, 8:45am

Hi,

The first approach seems very common in domain adaptation techniques. It is simple and effective.
The second approach should be the best in terms of actual BLEU score but this creates new integration challenges.

mayub · March 25, 2019, 3:28pm

Thanks @guillaumekln

I’ll might be trying both the approaches in parallel to see which one works out better and update here.

mzeid · June 6, 2020, 5:57pm

Hi @mayub, the second idea/approach is very interesting. Do you mind sharing how you managed to implement it using BERT? Thanks in advance!