Support shard training

(Guillaume Klein) #1

Automatically split the training data into several data packages and iterate over them during training.

(Pdakwale) #2

Hi @guillaumekln, Is sharding available now in Lua version ? I want to train with very large training data (both source and target > 10 GB), however, it goes out of memory while preprocessing. Though I have enough memory allocated, process is killed due to memory limit. No problem with python version with sharding.

(Guillaume Klein) #3


We finally implemented a different approach called dynamic dataset where training data are processed on the fly.

See the related documentation: