I used preprocess.py
to generate a *.vocab.pt
file with the -shared_vocab
flag on . The output has these attributes
{'src': <onmt.inputters.text_dataset.TextMultiField at 0x7fa28f4e6410>,
'tgt': <onmt.inputters.text_dataset.TextMultiField at 0x7fa258920210>,
'indices': <torchtext.data.field.Field at 0x7fa258920310>}
I am trying to extract the word frequency for each word in the vocabulary so that I can match them with the corresponding row in the embedding matrix. How should I do it?