Is it possible to use a vocabulary from a dictionary separate from the training data, and then train with separate training data?
Hey @leokonst
Yes, both OpenNMT-py and OpenNMT-tf can be given pre-existing vocabs to build from.
For OpenNMT-py you can have a look at the -src_vocab
/ -tgt_vocab
flags.
For OpenNMT-tf you set source_vocabulary
and target_vocabulary
in the config.
Small tip if you’re new to NMT, subword methods (sentencepiece/BPE) are most of the time better than word vocabs.
Definitely. That’s a tip worth following