Changing the behaviour of `end_epoch` options when used in combination with `train_from` and `continue` options

aurelien-coquard · October 12, 2017, 12:27pm

Using train_from and continue options allows to continue from an existing model.
The latest epoch number is retrieved, so there’s no need to specify it with start_epoch option. However, we need to specify the final epoch until which we want to continue the training. We can use the end_epoch option for that.
Now, let’s say I want to train n many more epochs. I have to know what is the latest epoch number. It would be useful if, when continue option is present, the end_epoch would be interpreted as “this many more epochs”.

What do you think about this proposition?

aurelien-coquard · October 12, 2017, 12:42pm

In fact, reading the changelog, it seems that end_epoch used to be named epochs. I don’t know what would be the clearer, add a new option to tell how many epochs we want to train from the start or, change the behaviour of end_epoch when used with continue ?

guillaumekln · October 12, 2017, 12:48pm

I think we should add a new option epochs to avoid confusion. If the value is > 0, it takes priority over end_epoch.

A training epoch by epoch would look like this:

th train.lua [...] -epochs 1
th train.lua [...] -train_from model.t7 -continue -epochs 1
th train.lua [...] -train_from model.t7 -continue -epochs 1
...

vince62s · October 12, 2017, 1:06pm

Guys,

I would like to take this opportunity to re question the epoch concept and may try to move to “iteration” or “steps” like most other projects do.
Just saying.

Vincent

guillaumekln · October 16, 2017, 7:44am

With the data (or file) sampling, we actually changed the definition of an epoch from a pass over the whole dataset to a number of steps after which to perform evaluation and learning rate updates. So in a way, we support both the epoch and step worlds.

But I agree that steps only is clearer and easier to manage. We don’t plan to make a full switch in OpenNMT (Lua), it could happen elsewhere though.