First of all, I would like to thank the OpenNMT community for building and sharing such a great NMT implementation.
I am running through an issue when training a system from ~70M parallel sentences with OpenNMT-torch. When I run preprocess.lua with the parameters below, it consumes almost 100GiB of memory. This is problematic since the memory footprint exceeds the memory capacity of our machine, the process makes use of swap and it is dramatically slowed down.
Anyone experienced a similar issue and found a solution? I see that preprocess.py in OpenNMT-pytorch has a “-max_shard_size” option that may be useful, but I cannot find a similar option in OpenNMT-torch.