Tokens vs examples for batch_type

SamuelLacombe · July 3, 2021, 7:21pm

I’m curious to understand how “tokens” are being used in the system when used in batch_type parameter. “examples” is really straight forward as one example correspond to one line in my training files, but if we specify “batch_size: 3072” with “batch_type: tokens” does it mean it will include all the lines until it reach a total of 3072 tokens (token separated by space)?

I guess that would also mean that batch_size should be adjusted if we use BPE?

guillaumekln · July 3, 2021, 7:27pm

Yes, it’s as simple as that.

When training Transformer models you usually want to set the largest batch size that can run on your GPU, so the tokenization is not relevant here.

SamuelLacombe · July 5, 2021, 1:37am

For your second point, is there an “automatic” way to determine the highest number supported by my GPU?

guillaumekln · July 5, 2021, 7:22am

Yes, you can set the batch size to 0. The training process will try to find the highest value.

ymoslem · July 5, 2021, 5:07pm

Hi Guillaume!

This would be a great feature. However, when I tried batch_size: 4096
I have Memory Usage 16213MiB / 16280MiB but when I tried batch_size: 0 I have Memory Usage 7847MiB / 16280MiB

Is this normal?

Thanks!
Yasmin

guillaumekln · July 5, 2021, 6:09pm

Hi @ymoslem,

Is this with OpenNMT-tf?

ymoslem · July 5, 2021, 6:22pm

Apologies, it is with OpenNMT-py. Should it still work?

guillaumekln · July 5, 2021, 6:34pm

No it will not work. This batch size auto selection is only implemented in OpenNMT-tf at the moment.