Vocabulary in nmt

Is it possible to use a vocabulary from a dictionary separate from the training data, and then train with separate training data?

Hey @leokonst
Yes, both OpenNMT-py and OpenNMT-tf can be given pre-existing vocabs to build from.
For OpenNMT-py you can have a look at the -src_vocab/ -tgt_vocabflags.
For OpenNMT-tf you set source_vocabularyand target_vocabulary in the config.

Small tip if you’re new to NMT, subword methods (sentencepiece/BPE) are most of the time better than word vocabs.

Definitely. That’s a tip worth following :slight_smile: