Get vocab of trained model and vocab with bpe

(Zuzanna Parcheta) #1

Hi! It is possible to get a vocabulary of trained model? For example, if I limit the src and tgt vocab to 20.000, can I see which words the model chose?

And one more question. If I limit the vocab to 20.000 and I use tok_src_bpe_model and tok_tgt_bpe_model options, the vocab used to train the translation model will be from pbe or will use original words?
I mean, the vocab will be like: “s, everal, l, ost, probl, em, relev, ant” which are subwords from bpe model or it will be like: “several, lost, problem, relevant” ?

Thank for all!

(Zuzanna Parcheta) #2

ok, I see. The dictionary is created in output folder after preprocessing step.