I’m using OpenNMT-py. The NMT architecture is a encoder-decoder bidirectional LSTM with attention model (global attention). The tokenized data has been used as the training data.
I manage to train the NMT sucessfully and obtain the BLEU score against a testset.
Afterwards to check the replicability of the results, I have run the same experiment multiple times. However each time the result I get is fluctuated by +/- 0.5 BLEU points.
(1) Can I know whether this is a typical behaviour? and assume that as pretrained embeddings are not used, but each time random initialization results in this?
(2) Is there a way that I can make the results consistent by setting training parameters etc?