Automatically split the training data into several data packages and iterate over them during training.
Hi @guillaumekln, Is sharding available now in Lua version ? I want to train with very large training data (both source and target > 10 GB), however, it goes out of memory while preprocessing. Though I have enough memory allocated, process is killed due to memory limit. No problem with python version with sharding.
We finally implemented a different approach called dynamic dataset where training data are processed on the fly.
See the related documentation: http://opennmt.net/OpenNMT/training/sampling/#dynamic-dataset