Question about using the Pretrained embedding

Hi,@guillaumekln

in opennmt-tf, I add one class to use the pretrained embedding with transformer, like below:

super(Transformer, self).init(
source_inputter=onmt.inputters.WordEmbedder(
vocabulary_file_key=“source_words_vocabulary”,
embedding_file_key=“src_embedding”,
embedding_size=512,
dtype=dtype),
target_inputter=onmt.inputters.WordEmbedder(
vocabulary_file_key=“target_words_vocabulary”,
embedding_file_key=“tgt_embedding”,
embedding_size=512,
dtype=dtype),
num_layers=6,
num_units=512,
num_heads=8,
ffn_inner_dim=2048,
dropout=0.1,
attention_dropout=0.1,
relu_dropout=0.1)

Here is my question:
1、Maybe embedding_file_key means the glove or word2vec embedding files,
but their dimention are generally 300 or 200, so can I set “embedding_size=512” too?

2、like using pretrained embedding in opennmt-py(How to use GloVe pre-trained embeddings in OpenNMT-py)

there are many missing embeddings(the results of opennmt-py below):

  • enc: 20925 match, 8793 missing, (70.41%)
    • dec: 20923 match, 13342 missing, (61.06%)
      Filtered embeddings:
  • enc: torch.Size([29718, 300])
  • dec: torch.Size([34265, 200])

how does the system handle the missing embedding? Are the misssing parts tokens embedding under the random initialization at the beginning of the trainning? then both the miss and unmiss are update when trainning?

You should not set embedding_size, it will be inferred from the embedding file. The Transformer model supports embedding sizes that are different than num_units.

Yes.

Yes.

Thanks very much for your help! @guillaumekln
Does the setting above is enough?

none should None. The rest looks good to me.