Hi,
I would like to use pretrained word embeddings, but only for one of multiple inputs (in a parallel inputter). I understand that with older versions, one used to set embedding_file_key="source_word_embeddings"
and then define the source_word_embeddings
file in the YAML, but with the current version (2.9.0) I get a TypeError: ('Keyword argument not understood:', 'embedding_file_key')
.
The documentation says that I have to define source word embeddings as:
data:
source_embedding:
path: data/glove/glove-100000.txt
with_header: True
case_insensitive: True
trainable: False
But I can’t find how to do this for only one of multiple inputters.
Model I would like to run (with old way of defining pretrained embeddings):
class MyAttentionModel(_RNNBase):
def __init__(self):
super(MyAttentionModel, self).__init__(
source_inputter=inputters.ParallelInputter([
inputters.WordEmbedder(
embedding_size=512),
inputters.WordEmbedder(
embedding_size=16),
inputters.WordEmbedder(
embedding_size=512,
embedding_file_key="source_word_embeddings")],
reducer=layers.ConcatReducer()),
target_inputter=inputters.WordEmbedder(
embedding_size=512),
encoder=encoders.RNNEncoder(
num_layers=4,
num_units=1000,
dropout=0.2,
residual_connections=False,
cell_class=tf.keras.layers.LSTMCell),
decoder=decoders.AttentionalRNNDecoder(
num_layers=4,
num_units=1000,
bridge_class=layers.CopyBridge,
attention_mechanism_class=tfa.seq2seq.LuongAttention,
cell_class=tf.keras.layers.LSTMCell,
dropout=0.2,
residual_connections=False))
YAML (with old way of defining pretrained embeddings):
model_dir: test
data:
train_features_file:
- src1-train.txt
- src2-train.txt
- src3-train.txt
train_labels_file: tgt-train.txt
eval_features_file:
- src1-val.txt
- src2-val.txt
- src3-val.txt
eval_labels_file: tgt-val.txt
source_1_vocabulary: src1-vocab.txt
source_2_vocabulary: src2-vocab.txt
source_3_vocabulary: src3-vocab.txt
target_vocabulary: tgt-vocab.txt
source_word_embeddings: src3_embeddings.txt
train:
batch_size: 64