Support shard training

guillaumekln · January 13, 2017, 1:56pm

Automatically split the training data into several data packages and iterate over them during training.

pdakwale · June 16, 2018, 1:04pm

Hi @guillaumekln, Is sharding available now in Lua version ? I want to train with very large training data (both source and target > 10 GB), however, it goes out of memory while preprocessing. Though I have enough memory allocated, process is killed due to memory limit. No problem with python version with sharding.

guillaumekln · June 16, 2018, 6:38pm

Hello,

We finally implemented a different approach called dynamic dataset where training data are processed on the fly.

See the related documentation: http://opennmt.net/OpenNMT/training/sampling/#dynamic-dataset