Memory optimization

vince62s · November 7, 2017, 12:57pm

As of today, batches are handled as a number of segments.
Incidently, it limits the sequence size to fit the number of timesteps in memory.

When dealing with BPE/subwords or character level modeling it really becomes an issue.

Some other toolkits use a number of token per batch.
Therefore, it can handle very long sequences in small numbers and very short sequences in high numbers.

It would not only enable to take into account much longer sequences but also to optimize the memory usage.

PR welcome ! I know…

kdminamoto · June 12, 2019, 3:31pm

For OpenNMT-tf, I don’t know if it is a good solution, but how about let users decide whether to use memory_swap, which is a parameter of tf.contrib.seq2seq.dynamic_decode and tf.nn.dynamic_rnn, etc?

guillaumekln · June 18, 2019, 6:53pm

This will actually be the default in TensorFlow 2.0. Do you have some experience with this flag?

kdminamoto · June 20, 2019, 2:44pm

Yes, I tried it before.

When doing character-level training, I could set a little bit more batch size (10→20), but the host memory occupation was a disaster: about 90GB VSZ and 40GB RSS.