Support shard training


(Guillaume Klein) #1

Automatically split the training data into several data packages and iterate over them during training.


(Pdakwale) #2

Hi @guillaumekln, Is sharding available now in Lua version ? I want to train with very large training data (both source and target > 10 GB), however, it goes out of memory while preprocessing. Though I have enough memory allocated, process is killed due to memory limit. No problem with python version with sharding.


(Guillaume Klein) #3

Hello,

We finally implemented a different approach called dynamic dataset where training data are processed on the fly.

See the related documentation: http://opennmt.net/OpenNMT/training/sampling/#dynamic-dataset