Domain Specific Training Questions (Transfer Learning?)

Hi,

I’m working on building a chatbot using conversational data. I have a domain specific corpus of 40K utterances. I tried training on only this, but results the were poor (as expected). I would like to try training on a much larger corpus first, then continue training the model on the domain specific data.

I have some questions about the process:

  1. Would it be beneficial to generate the vocab file using both the parent dataset and the domain specific dataset? I am using SentencePiece.

  2. Do I train the parent model until covergence, then continue training on the domain specific data until that converges?

  3. If anyone has done this before, do you have any advice?

Thanks

Hi,

I think you got it right.

  1. You could use as much data as possible to build the vocabulary, including in domain data.
  2. Yes, but there are 2 points to keep in mind:
    a. You should not train the parent model so long that the learning rate becomes too small to make any further improvements.
    b. When continuing the training, you might want to mix the domain data with some generic data as well if you want to retain good quality on generic inputs.

Thanks for the advice. What range of learning rate would be considered too small?