I also am having this issue. I have valid_batch_size: 8
, batch_size: 4096
, and batch_type: "tokens"
which I copied from the transformer documentation. Should I switch to batch_type "sents"
?
OpenNMT-py trained to 4000 steps then has this error a number of times:
[2021-02-22 22:01:33,091 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:01:33,100 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:01:33,130 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:14:37,309 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:14:37,366 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,582 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,648 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,650 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,650 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,654 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,660 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,664 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:27:16,695 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:09,601 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:09,612 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,204 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,216 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,216 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,256 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:40:10,294 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:09,395 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:10,063 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:10,065 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
[2021-02-22 22:53:10,080 WARNING] The batch will be filled until we reach 1,its size may exceed 8 tokens
The GPU usage shown by nvidia-smi
dropped to ~50% from ~95% and didn’t make progress in 8 hours of training. onmt_train
also shows as using 100% of one CPU core.
# Batching
queue_size: 10000
bucket_size: 32768
world_size: 1
gpu_ranks: [0]
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 8
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]
Full code