I try to mix word and document embeddings
- I have pretrained and fixed classic word embeddings (dim 512).
- My corpus is made of documents so I also have document embeddings for each document (dim 512).
- I know for each word in which document it appears
I’ve heard I can create a corpus like this (see below) and tell opennmt-tf to concatenate (say I want to concatenate here) embeddings of each word with its corresponding document embedding during training/inference
"The|doc1 cat|doc1 is|doc1 …
Fruits|doc2 are|doc2 delicious|doc2 …"
However I don’t know where to start to train my model using this possibility
Thank you in advance,