Resume training can be useful in various contexts:
- Pursue an existing training with the same parameters for a few more epochs
- Pursue a training with new data for in-domain adaptation or incremental training
- Change some settings between two runs
The first parameter that will trigger a training resume is: -train_from
At this point some parameters describing the topology will be loaded from the model itself, hence making useless these in the command line: layers, rnn_size, brnn, brnn_merge, input_feed,
[for developpers: you may want to raise a warning / error if command line include train_from and one of these options]
Then there are two options:
- you want to keep the training setting. You need to set “-continue”. In this mode, the following parameters will be loaded from the model checkpoint itself: start_epoch, start_iteration, learning_rate, learning_rate_decay, start_decay_at, optim, optim_state and curriculum
You can change the data file for incremental / in-domain adaptation, plus set the end_epoch parameter.
[for developpers: you may want to raise a warning / error if command line includes these preloaded options ]
- If you want to change the training setting (learning rate, decay, …) you need to start a new training curve.
In this case, you need to be specific with:
start_epoch ===> @guillaumekln [no check here versus last epoch run, right ?]
start_iteration, learning_rate, learning_rate_decay, start_decay_at, optim, optim_state and curriculum
I am unclear (but will EDIT when I know it] about some other options like:
and a few other ones.