OpenNMT-py experiment results fluctuation

alokaf · February 26, 2021, 10:37pm

I’m using OpenNMT-py. The NMT architecture is a encoder-decoder bidirectional LSTM with attention model (global attention). The tokenized data has been used as the training data.

I manage to train the NMT sucessfully and obtain the BLEU score against a testset.

Afterwards to check the replicability of the results, I have run the same experiment multiple times. However each time the result I get is fluctuated by +/- 0.5 BLEU points.

(1) Can I know whether this is a typical behaviour? and assume that as pretrained embeddings are not used, but each time random initialization results in this?

(2) Is there a way that I can make the results consistent by setting training parameters etc?

francoishernandez · March 1, 2021, 8:33am

You can check the seed parameter, which allows you to set a specific seed for the random initializations.

alokaf · March 1, 2021, 8:56am

What would be the acceptable range for seed? any positive integter?

Could you also let me know the importance of using the parameter --param_init .

Are both --seed and --param_init are recommended for an experiment?

francoishernandez · March 1, 2021, 10:08am

Yes, any positive integer should be ok for seed.

param_init is set to 0.1 by default. You can experiment changing the value if you want, but it’s indeed probably not a good idea to disable it. For more details, you can find a lot of resources out there about neural network initialization.

alokaf · March 1, 2021, 10:21am

Comments well noted. Thank you.

alokaf · March 4, 2021, 6:52am

I have tried with parameter --seed 42 and following are the training parameters set. With the same parameters I have run the experiments twice.

!CUDA_VISIBLE_DEVICES=0,1 onmt_train
–data ‘/content/drive/MyDrive/baseline/baselineC/src-tgt’
–save_model ‘/content/drive/MyDrive/baseline/baselineC/run1’
*–seed 42 *
–src_word_vec_size 500
–tgt_word_vec_size 500
–encoder_type brnn
–decoder_type rnn
–rnn_size 500
–enc_layers 2
–dec_layers 2
–rnn_type LSTM
–global_attention dot
–batch_size 32
–optim adam
–adam_beta1 .9
–adam_beta2 .999
–dropout .4
–learning_rate 0.001
–train_steps 120000
–valid_steps 5000
–report_every 5000
–gpu_ranks 0 \

However the BLEU scores obtained on validation set are different in two experiments. Could you advice?

RUN1
run1_step_80000.pt : BLEU = 16.73
run1_step_85000.pt : BLEU = 16.97
run1_step_90000.pt : BLEU = 16.97
run1_step_95000.pt : BLEU = 16.99

RUN2
run1_step_80000.pt : BLEU = 16.86
run1_step_85000.pt : BLEU = 17.16
run1_step_90000.pt : BLEU = 17.17
run1_step_95000.pt : BLEU = 17.08

francoishernandez · March 4, 2021, 8:52am

That’s not expected.
A few questions:

which version of OpenNMT and pytorch are you using?
did both experiments run on exactly the same machine, and on exactly the same GPU on this machine?

alokaf · March 4, 2021, 9:03am

Im using OpenNMT-py v1.2 forked on 20 Sept 2020
GitHub - aloka-fernando/OpenNMT-py-v1.2: Open Source Neural Machine Translation in PyTorch.
I ran both experiments on google colab. Could you explain how the machine/GPU affects it?

francoishernandez · March 4, 2021, 10:13am

IIRC a fixed seed does not fully guarantee deterministic results. This can change depending on the platform and device. As google colab is most probably distributing its workload across massive amounts of machines, slight differences are expected I guess.

(Can’t find the exact discussion I remember reading about this but this can be a start: Reproducibility — PyTorch 1.7.1 documentation )