BPELearner from pyonmttok

Hi all,
I want to train bpe model using python pyonmttok. I use the following code:
import pyonmttok
fichero = ‘en.10k’
learner = pyonmttok.BPELearner(symbols=1000)
learner.ingest_file(fichero)
tokenizer = learner.learn(‘en.10k.model’)
tokenizer.tokenize_file(‘en.10k’, ‘en.10k.tok’)

Is it possible to get vocab file from learner/tokenizer compatible with Opennmt-tf training?

Regards and thanks in advance

1 Like

Hi,

You can run onmt-build-vocab on a training file that was tokenized with your BPE model.

2 Likes