Hope it is not too much topics from me.
I want to have stable way to calculate BLEU score to upgrade my model.
So, I have to use test set without OOV words.
I think I need to get a list of known words in my model file (.pt) and clean my test file from any OOV words.
So, how can I get list of words from a .pt file? Or I can get it during training somehow?
The vocabulary is stored as a torchtext Vocab object inside your checkpoint.
You can load your checkpoint with torch.load
, that will give you a python dict. One of the keys is “vocab”.
And then, you can read this.