CUDA out of memory in validation step

I am using the following command to train a Transformer system:

python -data data_big/big -save_model demo-model_big -layers 6 -rnn_size 512 -word_vec_size 512 \
-transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 200000 \
-max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 -optim adam -adam_beta2 0.998 \
-decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0  -param_init_glorot -label_smoothing 0.1 -valid_steps 5000 \
-save_checkpoint_steps 2500 -world_size 1 -gpu_ranks 0 -valid_batch_size 4

I obtain the following error at the first validation step:

RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 14.73 GiB total capacity; 8.46 GiB already allocated; 4.96 GiB free; 545.61 MiB cached)

As it is possible to see the -valid_batch_size is already decreased as suggested in other posts but it doesn’t seem to work.

What could I do? I am training on 40 milions sentences, and the validation set is made by 40k.

you must have a very very long segment in your valid set. double check this first.

I tried with valid_batch_size 2 and it seems to work (at least in the first validation step). In your opinion a small batch size could be a problem in terms of final performance?

No, it’s not a problem.