Shared embedding matrix in transformer?

pytorch

(Xuanqing Liu) #1

I once tried the Transformer by the scripts: https://github.com/OpenNMT/OpenNMT-tf/tree/master/scripts/wmt
and the training parameters in the OpenNMT-py FAQ section.

But I also noticed that the Tensorflow implementation: https://github.com/tensorflow/models/tree/master/official/transformer shares the vocab embedding matrix in both encoder and decoder, this is also mentioned in blog “The Annotated Transformer”. While OpenNMT-py seems not (??), does anyone know the difference?


(Guillaume Klein) #2

Embeddings sharing can be optionally enabled, see the OpenNMT-py options:

  • -share_embeddings
  • -share_decoder_embeddings

It is not required, and if you have lots of data you should even get better results without sharing (because more parameters).