Is it possible to track the latest shard that has been used during training and save it in the checkpoints. As training continues at later time, it would pick up from the latest shard.
This option could be enabled when sample_buffer_size is used.
Not sure we will implement this. I think it will introduce a lot of additional logic for little benefits.
If it is important for you that the training sees every examples exactly the same number of times, you can run the training epoch by epoch as shown here:
If the training is stopped during epoch T, you simply continue from the last checkpoint of epoch T-1.
Okay, I got it.