Saving embedding weights from LM

sonalsannigrahi · March 12, 2021, 9:42pm

Hello,

So I have a small doubt regarding saving embeddings. I would like to train a masked language model on some text corpus and then save the final trained embeddings weights to load into a NMT decoder. Is this possible? Otherwise how can I go about saving embeddings from a trained model? Thank you in advance!

francoishernandez · March 13, 2021, 1:34pm

Which version of OpenNMT are you using?

sonalsannigrahi · March 13, 2021, 1:36pm

I just cloned the repository last week so it should be the latest version from GitHub (OpenNMT-py)

francoishernandez · March 15, 2021, 9:37am

This is quite similar to:

Retrieve NMT Encoder's output · Issue #1811 · OpenNMT/OpenNMT-py · GitHub
How can I get some hidden states of model(such as self-attention matrix) during translation

There is no pre-existing way to do such a thing, but it shouldn’t be very difficult to add the few lines needed to dump those, or extract them from an existing checkpoint.

You can access all the parameters directly by loading a checkpoint:

import torch
checkpoint = torch.load("my_checkpoint.pt")

→ this checkpoint is a python dict containing various entries, one of which is 'model', which is the state_dict of the pytorch model.