Just wondering if you ever tested these techniques as an alternative to dropout.
There was a pull request implementing recurrent batch normalization normalization but it did not work at all.
Are you referring to a paper or another framework?
I see this: https://github.com/OpenNMT/OpenNMT/pull/101
so, nobody actually fixed it, right ?