I want to use GloVe embeddings concatenated with features: documents in my .src file consist of
w|f1|f2|f3|f4, and I want
w to be a GloVe embedding, whereas the four features should be concatenated. How do I do this?
When I create a vocabulary with
tools/embeddings_to_torch.py given the
w|f1|f2|f3|f4 .src file, the keys look like
w|f1|f2|f3|f4 (e.g. winning|VERB|amod|NONE|O), but I would like the keys to be words (e.g. winning). Otherwise, it does not make sense to use GloVe embeddings since these
w|f1|f2|f3|f4 keys will not match with GloVe word vectors.
My current approach is as follows: I created a vocabulary using a .src file without features (only
w, i.e. without features, but otherwise identical), and used the GloVe embeddings based on the vocabulary of this file. For training I use the
w|f1|f2|f3|f4 .src files and the
w GloVe embeddings. However, I am not sure if this has the desired effect (probably not). During training, I use -
feat_merge concat and