When I train my model, the learning rate increases at a constant rate until 0.001, then decreases at a constant rate afterwards. Why is this? Why isn’t the learning rate set to be constant?
At what stage of the learning rate would I want to switch to my domain-specific corpus for transfer learning? My parent corpus is 2.8M, and my domain specific has 40K. I was going to switch over when the learning rate is decreasing and around 0.0002, but I really have no idea how to choose when. For reference, the learning rate peaks at 0.001 (edited my original post to reflect this).
I think it’s easier to reason in number of training steps. Maybe start with 100k training steps on generic data before starting the adaptation. As you gain more experience, you will probably find a better value.
If I wait until 100K steps, the learning rate will have decreased to the minimum (0.000001). Is it okay to start training the domain-specific corpus at the minimum learning rate?