OpenNMT Forum

Transformer model on 2 GPU vs. 4 GPU

(Yasmin Moslem) #1


I want to run the transformer model with the parameters mentioned at:

However, the machine I am currently using has only 2 GPU. Should I adjust any of the recommended values to match the expected result?

Many thanks,


I have two GPUs. So add the parameters like:

-world_size 2 -gpu_ranks 0 1

Then keep the two GPUs working.
I also use watch -n 1 -d nvidia-smi to moniter the performance of them.
Just do it, experience is the best teacher

(Yasmin Moslem) #3

Thanks, Yaren, for your reply.

Yes, sure about -world_size 2 -gpu_ranks 0 1 I just wondered if I should change other parameters like -batch_size

Thanks indeed for the tip!

Kind regards,


-batch_size should be changed to fit your GPU RAM,
-train_steps depends the number of pairs of your corpus.
Otherwise, you can use the command like here:

Copyed here for your:

python -data /tmp/de2/data -save_model /tmp/extra \
        -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8  \
        -encoder_type transformer -decoder_type transformer -position_encoding \
        -train_steps 200000  -max_generator_batches 2 -dropout 0.1 \
        -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 \
        -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 \
        -max_grad_norm 0 -param_init 0  -param_init_glorot \
        -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 \
        -world_size 4 -gpu_ranks 0 1 2 3 
(Yasmin Moslem) #5

Thanks, Yaren, for your insights!
I am currently training a model with the Transformer recommended parameters and will compare the results to the model I already trained with the default parameters for the same corpus.