geckuba
November 16, 2020, 8:48pm
1
Hi everyone!
I want to combine char embeddings with word embeddings (preferably BERT) in opennmt-py.
It is my understanding that there is a way to combine different types of embeddings in opennmt-ft, e.g. as discussed in these threads:
Hello all,
I try to mix word and document embeddings
Let’s say:
I have pretrained and fixed classic word embeddings (dim 512).
My corpus is made of documents so I also have document embeddings for each document (dim 512).
I know for each word in which document it appears
I’ve heard I can create a corpus like this (see below) and tell opennmt-tf to concatenate (say I want to concatenate here) embeddings of each word with its corresponding document embedding during training/inference
"The|do…
Is there a way to combine using the BPE (byte pair encoding) with a factored model, for instance with word|lemma|pos combination?
Is there something similar in opennmt-py, preferably with support of BERT embeddings?
Hi there,
There are a few bricks for using pre-trained embeddings , but you may need to adapt a few things (conversion scripts, etc.)
It would not be combing several types of embeddings though.