I have raised an issue on Github regarding performance of my baseline model when trained with default parameters for NMT except “global attention”, which I set to “mlp”.
Issue in details here: https://github.com/OpenNMT/OpenNMT-py/issues/553
Am I missing some setting? I do not think changing attention should have such a drastic effect. Any help would be appreciated.