I am trying to use the OpenNMT-py to create sentence embeddings for the image captions of MSCOCO2015 dataset. In order to do so, I am trying to use the tutorial on using pretrained GloVe word embeddings to “translate” sentences back to their-selves.
First of all, does this make sense or is there a better approach to create sentence embeddings like this?
Are there any suggestions on which models and hyper-parameters to use?
what do you mean with using GoVe word embeddings to “translate” sentences back to their-selves?
As I see it, you can get sentence embeddings by:
summing/averaging word embeddings from the words in the sentence
using an encoder: for each sentence the encoder produce an h vector that summarizes the sentence information, like they explain here: Sentence embeddings and n-best lists
training a sentence embedding model: like the one proposed by Mikolov here and use that sentence representation.
If I wanted to have a good representation of the image captions, I would train a good language model, on a big and representative English data set and then I would get the sentences representations from that language model.
Remind that a language model is “just” an encoder trained on monolingual data in order to predict the following word of a sentence.
Thanks for your reply! By translating back to their-selves, I meant using the provided by OpenNMT-py Encoder/Decoder architectures in an AutoEncoder fashion to obtain a latent compressed vector as the representation of the sentences.
Ultimately, I want to pre-train two models and then re-use them in a different architecture: one Sentence to Vector Encoder and one Vector to Text Decoder.