Shared embedding matrix in transformer?


(Xuanqing Liu) #1

I once tried the Transformer by the scripts:
and the training parameters in the OpenNMT-py FAQ section.

But I also noticed that the Tensorflow implementation: shares the vocab embedding matrix in both encoder and decoder, this is also mentioned in blog “The Annotated Transformer”. While OpenNMT-py seems not (??), does anyone know the difference?

(Guillaume Klein) #2

Embeddings sharing can be optionally enabled, see the OpenNMT-py options:

  • -share_embeddings
  • -share_decoder_embeddings

It is not required, and if you have lots of data you should even get better results without sharing (because more parameters).