I’m using OpenNMT-py. The NMT architecture is a encoder-decoder bidirectional LSTM with attention model (global attention). The tokenized data has been used as the training data.
I manage to train the NMT sucessfully and obtain the BLEU score against a testset.
Afterwards to check the replicability of the results, I have run the same experiment multiple times. However each time the result I get is fluctuated by +/- 0.5 BLEU points.
(1) Can I know whether this is a typical behaviour? and assume that as pretrained embeddings are not used, but each time random initialization results in this?
(2) Is there a way that I can make the results consistent by setting training parameters etc?
param_init is set to 0.1 by default. You can experiment changing the value if you want, but it’s indeed probably not a good idea to disable it. For more details, you can find a lot of resources out there about neural network initialization.
IIRC a fixed seed does not fully guarantee deterministic results. This can change depending on the platform and device. As google colab is most probably distributing its workload across massive amounts of machines, slight differences are expected I guess.