Number of samples in one training step together with accumulation

jechoi · April 2, 2021, 11:00am

Hi,

how many samples would 1 training step cover together with accumulation?

For example if you see from log output 1/1000 training step and you have 2 gpus, 6 batchsize, 2 accum_count… then does that 1 training step include 2 x 6 x 2 samples? or does it only include 2 x 6 samples and you do not see that actual the optimization happening?

francoishernandez · April 2, 2021, 4:45pm

jechoi · April 2, 2021, 5:03pm

ah alright that was also one question that the print out about loading data seemed off. Thanks!!!

anderleich · April 7, 2021, 2:26pm

Hi,
I found this topic really interesting.
If a dataset has 4500 examples and we have a batch_size of 4096, 1 GPU and accum_count is 4. I guess in each steps the dataset is loaded 4 times. Is it correct?

francoishernandez · April 7, 2021, 3:29pm

batch_size of 4096 is probably tokens, not examples?

But if batch_type were to be examples, yes that would be the idea.

anderleich · April 8, 2021, 9:32am

I see. Are examples taken until 4096 tokens are filled?

francoishernandez · April 8, 2021, 5:10pm

Yes.

github.com

OpenNMT/OpenNMT-py/blob/8b073fb2a047509ff590839b1194a155ec1a50bf/onmt/inputters/iterator.py#L135-L153


def max_tok_len(new, count, sofar):
    """
    In token batching scheme, the number of sequences is limited
    such that the total number of src/tgt tokens (including padding)
    in a batch <= batch_size
    """
    # Maintains the longest src and tgt length in the current batch
    global max_src_in_batch, max_tgt_in_batch  # this is a hack
    # Reset current longest length at a new batch (count=1)
    if count == 1:
        max_src_in_batch = 0
        max_tgt_in_batch = 0
    # Src: [<bos> w1 ... wN <eos>]
    max_src_in_batch = max(max_src_in_batch, len(new.src[0]) + 2)
    # Tgt: [w1 ... wM <eos>]
    max_tgt_in_batch = max(max_tgt_in_batch, len(new.tgt[0]) + 1)
    src_elements = count * max_src_in_batch
    tgt_elements = count * max_tgt_in_batch
    return max(src_elements, tgt_elements)