If we train during N epochs of n minibatch of sentences, which means N x n minibatches,
I would like to be able to start training with a dropout schedule as follow:
(examples but we may generalize with variable value for %)
between 0% and 25% of minibatches, dropout linearly change from a value a(0%) to a(25%)
between 25% and 50% of minibatches, dropout linearly change from a(25%) to a(50%)
again between 50 and 75, then 75% to 100%.
-dropout-schedule could kind of a string with such a schedule.
Thanks Vincent for the detailed request. We want to implement more generally a notion of “training schedule” (I like the name) - that we can use to drive all the parameters during the training - typically optim method, learning_rate, but also guided alignment, number of parallel threads, boosting parameters, dropout, and whatever parameters that can change during the training. Also - we are considering dropping the notion of “epoch” to move rather to “steps” as other systems are doing. I will put some specs together and come back to you.
nope. it did not imply anything. just linear between two values for each segment.
here is my belief:
If we start with a high dropout, it’s too slow to converge.
But we need a higher dropout at some point, then I think by decreasing again it helps to get better performance.
So for example I would lilke to try this.
we start at 0 increase to 0.4 stay at 0.4 for some time then decrease again down to X (maybe 0).
it seems to give some interesting results in another context.