Dear fantastic support !
I would like to implement a scheduled dropout.
Ideally here is what I would like:
If we train during N epochs of n minibatch of sentences, which means N x n minibatches,
I would like to be able to start training with a dropout schedule as follow:
(examples but we may generalize with variable value for %)
between 0% and 25% of minibatches, dropout linearly change from a value a(0%) to a(25%)
between 25% and 50% of minibatches, dropout linearly change from a(25%) to a(50%)
again between 50 and 75, then 75% to 100%.
-dropout-schedule could kind of a string with such a schedule.
I hope this request is clear enough.
Thanks Vincent for the detailed request. We want to implement more generally a notion of “training schedule” (I like the name) - that we can use to drive all the parameters during the training - typically optim method, learning_rate, but also guided alignment, number of parallel threads, boosting parameters, dropout, and whatever parameters that can change during the training. Also - we are considering dropping the notion of “epoch” to move rather to “steps” as other systems are doing. I will put some specs together and come back to you.
Your description implies that the dropout is increasing over the course of the training. Did you mean the opposite?
nope. it did not imply anything. just linear between two values for each segment.
here is my belief:
If we start with a high dropout, it’s too slow to converge.
But we need a higher dropout at some point, then I think by decreasing again it helps to get better performance.
So for example I would lilke to try this.
we start at 0 increase to 0.4 stay at 0.4 for some time then decrease again down to X (maybe 0).
it seems to give some interesting results in another context.