I am training a Transformer model on Google Colab. Since I have 41 shard for the training dataset, I am not able to complete the full cycle (from train.shard.0 to train.shard.41) during the 12 hrs that Google provides for free. After the 12 hrs I am able to load till train.shard.30. How can I continue the training starting from another shard? If I use the -train_from flag the model always restart loading the shard number 0.
May I change the file name (so from train.shard.41 to train.shard.0 and viceversa) in order to use all the train data that I have?
unless you are willing to modify the code, the easiest tweak is to rename the shards.
Just pay attention to the sequence which is not numeric but alphanumeric (not optimal I know).
1 Like