I am using the following command to train a Transformer system:
python train.py -data data_big/big -save_model demo-model_big -layers 6 -rnn_size 512 -word_vec_size 512 \ -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 200000 \ -max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 \ -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 5000 \ -save_checkpoint_steps 2500 -world_size 1 -gpu_ranks 0 -valid_batch_size 4
I obtain the following error at the first validation step:
RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 14.73 GiB total capacity; 8.46 GiB already allocated; 4.96 GiB free; 545.61 MiB cached)
As it is possible to see the -valid_batch_size is already decreased as suggested in other posts but it doesn’t seem to work.
What could I do? I am training on 40 milions sentences, and the validation set is made by 40k.