Hello!
In the documentation under “train”, it says:
# (optional) Batch size is the number of "examples" or "tokens" (default: "examples").
batch_type: examples
# (optional) Tune gradient accumulation to train with at least this effective batch size
# (default: null).
effective_batch_size: 25000
I think the default batch_type
for training is tokens (not examples), and the default effective_batch_size
is 25000 (not null).
Many thanks,
Yasmin