The batch will be filled until we reach 1,its size may exceed 16 tokens

Hello, congratulations for such great work.

Could you please help to understand this WARNING. Will it have any affect on the training? How can I avoid it?

I am running a transformer model with one GPU setting and a small batch size of 16.

[2020-11-21 20:29:00,185 INFO] Loading ParallelCorpus(phonemes2text_v2/src-val.txt, phonemes2text_v2/tgt-val.txt, align=None)…
[2020-11-21 20:50:33,074 INFO] Validation perplexity: 13.9302
[2020-11-21 20:50:33,074 INFO] Validation accuracy: 25.8812
[2020-11-21 20:50:44,975 WARNING] The batch will be filled until we reach 1,its size may exceed 16 tokens

This warning is triggered when using batch_type tokens, if one example has more tokens than batch_size. Do you really want to train on batches of 16 tokens? You probably meant sentences? (batch_type "sents")

1 Like

I also am having this issue. I have valid_batch_size: 8, batch_size: 4096, and batch_type: "tokens" which I copied from the transformer documentation. Should I switch to batch_type "sents"?

OpenNMT-py trained to 4000 steps then has this error a number of times:

[2021-02-22 22:01:33,091 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:01:33,100 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:01:33,130 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:14:37,309 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:14:37,366 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,582 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,648 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,650 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,650 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,654 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,660 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,664 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,695 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:09,601 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:09,612 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,204 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,216 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,216 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,256 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,294 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:09,395 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:10,063 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:10,065 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:10,080 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens

The GPU usage shown by nvidia-smi dropped to ~50% from ~95% and didn’t make progress in 8 hours of training. onmt_train also shows as using 100% of one CPU core.

# Batching
queue_size: 10000
bucket_size: 32768
world_size: 1
gpu_ranks: [0]
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 8
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]

Full code

Your valid_batch_size should also be expressed in tokens.
The valid_batch_size in “tokens” mode was introduced in release 2.0.0, here:

Thanks, I switched to valid_batch_size: 4096