Shared embedding matrix in transformer?

Xuanqing · October 28, 2018, 7:28pm

I once tried the Transformer by the scripts: https://github.com/OpenNMT/OpenNMT-tf/tree/master/scripts/wmt
and the training parameters in the OpenNMT-py FAQ section.

But I also noticed that the Tensorflow implementation: https://github.com/tensorflow/models/tree/master/official/transformer shares the vocab embedding matrix in both encoder and decoder, this is also mentioned in blog “The Annotated Transformer”. While OpenNMT-py seems not (??), does anyone know the difference?

guillaumekln · October 30, 2018, 4:22pm

Embeddings sharing can be optionally enabled, see the OpenNMT-py options:

-share_embeddings
-share_decoder_embeddings

It is not required, and if you have lots of data you should even get better results without sharing (because more parameters).