Transformer training not starting

I recently installed openNMT using pip (Python 3.7.4, Pytorch 1.2.0). I’m trying to train a transformer, but openNMT seems to hang whenever the training gets started (line 81 in train.py, it’s waiting for a PID from the OS). Below is the shell script used to process the data and train the transformer.

    onmt_preprocess -train_src data/en_nso/ennso_parallel.train.en -train_tgt data/en_nso/ennso_parallel.train.nso -valid_src data/en_nso/ennso_parallel.dev.en -valid_tgt data/en_nso/ennso_parallel.dev.nso -save_data data/en_nso/ --share_vocab --src_vocab_size 8000 --tgt_vocab_size 8000 -overwrite

CUDA_VISIBLE_DEVICES=1 python3 train.py -data data/en_nso/ -save_model results \
-layers 4 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 4  \
-encoder_type transformer -decoder_type transformer -position_encoding \
-train_steps 200000  -max_generator_batches 2 -dropout 0.1 \
-batch_size 8000 -batch_type tokens -normalization tokens  -accum_count 2 \
-optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 16000 -learning_rate 2 \
-max_grad_norm 0 -param_init 0  -param_init_glorot \
-label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 \
-world_size 4 -gpu_ranks 0

Below is a screenshot of the error:

read the FAQ here:


if one GPU then world_size 1 gpu_ranks 0
if 4 gpu then world_size 4 gpu_ranks 0 1 2 3