Hi all,
I want to train bpe model using python pyonmttok. I use the following code:
import pyonmttok
fichero = ‘en.10k’
learner = pyonmttok.BPELearner(symbols=1000)
learner.ingest_file(fichero)
tokenizer = learner.learn(‘en.10k.model’)
tokenizer.tokenize_file(‘en.10k’, ‘en.10k.tok’)
Is it possible to get vocab file from learner/tokenizer compatible with Opennmt-tf training?
Regards and thanks in advance