My training data is relatively large though, the train.1.pt is 4.2G, and valid.1.pt is 185M. The source sequence I have is capped at 400 words.
I tried to manually set valid_batch_size but it didn’t work. Should I decrease my training batch size instead? My last resort is to cap source sequence length at 300 words instead. Any suggestions?
You need to shard your data set
currently it’s -max_shard_size
but we will switch to shard_size for text (currently this one is only valid for image and audio)
Hi Vince, thank you for responding, I thought sharding (-max_shard_size) only makes a difference for preprocesisng (preprocess.py) and only make the preprocessing go faster. Does this also make a difference for training and validation memory usage?
Wow, valid batch size = 32 is already very small, but when I lowered it to 16, it runs even on my original dataset (max src seq length = 400, and training batch_size=4096). Thanks for the help!