I’m curious to understand how “tokens” are being used in the system when used in batch_type parameter. “examples” is really straight forward as one example correspond to one line in my training files, but if we specify “batch_size: 3072” with “batch_type: tokens” does it mean it will include all the lines until it reach a total of 3072 tokens (token separated by space)?
I guess that would also mean that batch_size should be adjusted if we use BPE?
This would be a great feature. However, when I tried batch_size: 4096
I have Memory Usage 16213MiB / 16280MiB but when I tried batch_size: 0 I have Memory Usage 7847MiB / 16280MiB