BLEU very low with the almost baseline model NMT

I have raised an issue on Github regarding performance of my baseline model when trained with default parameters for NMT except “global attention”, which I set to “mlp”.

Issue in details here:

Am I missing some setting? I do not think changing attention should have such a drastic effect. Any help would be appreciated.

There was a miss in the documentation. With the 200k dataset, perplexity is around 29.
You can find more details here: