I trained model on data tokenized using onmt_build_vocab sentencepiece for a large enough vocab size along with BPE. In the tokenized training data I don’t see any <unk> tokens, neither in the generated vocabs. During sentencepiece training it says 99.99% characters covered. However, I see <unk> in some cases while running test for this model. My assumption was that we shouldn’t see <unk> in translation output after sentencepiece tokenization. Is there a <unk> token being added during training? The test inputs are also tokenized with the same model.