I want to use GloVe embeddings concatenated with features: documents in my .src file consist of w|f1|f2|f3|f4
, and I want w
to be a GloVe embedding, whereas the four features should be concatenated. How do I do this?
When I create a vocabulary with tools/embeddings_to_torch.py
given the w|f1|f2|f3|f4 .src
file, the keys look like w|f1|f2|f3|f4
(e.g. winning|VERB|amod|NONE|O), but I would like the keys to be words (e.g. winning). Otherwise, it does not make sense to use GloVe embeddings since these w|f1|f2|f3|f4
keys will not match with GloVe word vectors.
My current approach is as follows: I created a vocabulary using a .src file without features (only w
, i.e. without features, but otherwise identical), and used the GloVe embeddings based on the vocabulary of this file. For training I use the w|f1|f2|f3|f4 .src
files and the w
GloVe embeddings. However, I am not sure if this has the desired effect (probably not). During training, I use -feat_merge concat
and -feat_vec_exponent 0.7
.