I defined a multi-source transformer model and tried to achieve word embedding weights shared by all sources and the target by actually reusing the WordEmbedder, edited from the MultiSourceTransformer example:
class MultiSourceTransformer(onmt.models.Transformer): def __init__(self): # Don't use this, it doesn't work embedder = onmt.inputters.WordEmbedder(embedding_size=512) super().__init__( source_inputter=onmt.inputters.ParallelInputter( [embedder]*3 ), target_inputter=embedder, num_layers=6, num_units=768, num_heads=8, ffn_inner_dim=1024, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, share_encoders=True, )
model = MultiSourceTransformer
Well, after training a model, I can see that something went wrong. The training process seems to have worked, but after loading the model it produces garbage.
Now, I found this another post which describes a presumably working way of sharing embeddings:
I don’t understand enough about the internals of OpenNMT to figure out what actually happened with my broken approach, and it would be nice to know