Continue training from latest shard

Nart · February 20, 2021, 2:27pm

@guillaumekln
Is it possible to track the latest shard that has been used during training and save it in the checkpoints. As training continues at later time, it would pick up from the latest shard.
This option could be enabled when sample_buffer_size is used.

guillaumekln · February 22, 2021, 4:08pm

Not sure we will implement this. I think it will introduce a lot of additional logic for little benefits.

If it is important for you that the training sees every examples exactly the same number of times, you can run the training epoch by epoch as shown here:

https://opennmt.net/OpenNMT-tf/faq.html#how-to-count-the-number-of-epochs

If the training is stopped during epoch T, you simply continue from the last checkpoint of epoch T-1.

Nart · February 23, 2021, 11:06am

Okay, I got it.