Train Transformer model on Google Colab

AndreaM · May 6, 2019, 8:50am

I am training a Transformer model on Google Colab. Since I have 41 shard for the training dataset, I am not able to complete the full cycle (from train.shard.0 to train.shard.41) during the 12 hrs that Google provides for free. After the 12 hrs I am able to load till train.shard.30. How can I continue the training starting from another shard? If I use the -train_from flag the model always restart loading the shard number 0.
May I change the file name (so from train.shard.41 to train.shard.0 and viceversa) in order to use all the train data that I have?

vince62s · May 6, 2019, 6:27pm

unless you are willing to modify the code, the easiest tweak is to rename the shards.
Just pay attention to the sequence which is not numeric but alphanumeric (not optimal I know).