Tokens vs examples for batch_type

I’m curious to understand how “tokens” are being used in the system when used in batch_type parameter. “examples” is really straight forward as one example correspond to one line in my training files, but if we specify “batch_size: 3072” with “batch_type: tokens” does it mean it will include all the lines until it reach a total of 3072 tokens (token separated by space)?

I guess that would also mean that batch_size should be adjusted if we use BPE?

Yes, it’s as simple as that.

When training Transformer models you usually want to set the largest batch size that can run on your GPU, so the tokenization is not relevant here.

1 Like

For your second point, is there an “automatic” way to determine the highest number supported by my GPU?

Yes, you can set the batch size to 0. The training process will try to find the highest value.

Hi Guillaume!

This would be a great feature. However, when I tried batch_size: 4096
I have Memory Usage 16213MiB / 16280MiB but when I tried batch_size: 0 I have Memory Usage 7847MiB / 16280MiB

Is this normal?


Hi @ymoslem,

Is this with OpenNMT-tf?

Apologies, it is with OpenNMT-py. Should it still work?

No it will not work. This batch size auto selection is only implemented in OpenNMT-tf at the moment.

1 Like