Changing the behaviour of `end_epoch` options when used in combination with `train_from` and `continue` options

(Aurélien Coquard) #1

Using train_from and continue options allows to continue from an existing model.
The latest epoch number is retrieved, so there’s no need to specify it with start_epoch option. However, we need to specify the final epoch until which we want to continue the training. We can use the end_epoch option for that.
Now, let’s say I want to train n many more epochs. I have to know what is the latest epoch number. It would be useful if, when continue option is present, the end_epoch would be interpreted as “this many more epochs”.

What do you think about this proposition?

(Aurélien Coquard) #2

In fact, reading the changelog, it seems that end_epoch used to be named epochs. I don’t know what would be the clearer, add a new option to tell how many epochs we want to train from the start or, change the behaviour of end_epoch when used with continue ?

(Guillaume Klein) #3

I think we should add a new option epochs to avoid confusion. If the value is > 0, it takes priority over end_epoch.

A training epoch by epoch would look like this:

th train.lua [...] -epochs 1
th train.lua [...] -train_from model.t7 -continue -epochs 1
th train.lua [...] -train_from model.t7 -continue -epochs 1

(Vincent Nguyen) #4


I would like to take this opportunity to re question the epoch concept and may try to move to “iteration” or “steps” like most other projects do.
Just saying.


(Guillaume Klein) #5

With the data (or file) sampling, we actually changed the definition of an epoch from a pass over the whole dataset to a number of steps after which to perform evaluation and learning rate updates. So in a way, we support both the epoch and step worlds.

But I agree that steps only is clearer and easier to manage. We don’t plan to make a full switch in OpenNMT (Lua), it could happen elsewhere though. :wink: