Although OpenNMT use ‘even source batch’ (i.e. source minibatch have same length) as default batch configuration, I need ‘uneven source batch’ (i.e. sourec minibatch have different length) for specific purpose.
In case training with uneven source batch, OpenNMT provide :maskPadding()
i.e. ) Seq2Seq.lua line 159
if batch.uneven then
self.models.decoder:maskPadding(batch.sourceSize, batch.sourceLength)
However, this function make nn.MaskedSoftmax() whenever new batch is coming.
And making this module include making submodules (nn.Narrow(), nn.Padding()) for each sentence in minibatch.
My question is
How much overhead is produced when training with ‘uneven batch’ mode?
(I wonder whether overhead is negligible or not)