OpenNMT Forum

Getting the vocabulary after preprocessing

How does one get the vocabulary of a tensor after preprocessing with preprocess.py? I get files for train and validation datasets and a vocab file. This vocab file is a dict of fields and TextMultiField. I cannot seem to find any vocabulary mapping in these files.

Hi Rajashan,

It is covered here https://github.com/OpenNMT/OpenNMT-py/issues/332

But I don’t get torchtext.vocab.Vocab objects, I get onmt.inputters.text_dataset.TextMultiField objects, which don’t seem to be vocabs?

May be there are more experienced forum members who can help you with it. This approach always worked fine for me when I was opening ‘.vocab.pt’ file this way.