Hi everyone, i’ve read opennmt word feature document and it said that when we use multiple feature, i will concate together. Does it mean, if we have pos tag NNP with feature vector is [1,2,3] and named entity PERSON with feature vector is [4,5,6], so it will concate together and become [1,2,3,4,5,6] ? And the last sentence of word feature section in document is
Finally, the resulting merged embedding is concatenated to the word embedding.
I still don’t understand how merged feature embedding is concate with word embedding. Can you gives me better explaination ?
Thanks for your reply. For example, if i have a pretrained word embedding and pretrained feature embedding (i.e pos tag embedding), with token “John|NNP”, i will concate the embedding vector of word “John” with embedding vector of feature “NNP”, then it becomes the embedding vector for token “John|NNP”.
Hey @lengockyquang@guillaumekln
Does the word features also work with the subwords model implemented by sentencepiece?
Even after using the u"\uFFE8" as a separator between words and features, the preprocessing script is not recognizing it as a feature ie…The number of features is still 0.
Can you help me out with adding features to words?
Thanks
The code looks something like this…
token.text + u"\uFFE8" + token.pos_