Hello, I have read this paper: Lessons on Parameter Sharing across Layers in Transformers | Papers With Code and this method seems to work fine (best blue score for WMT2014 English-German translation task) so I wondered if it was implemented in opennmt.py? If not, it doesn’t matter, but at least more people will know about it.
Thank you for reading