Oh, right. Then you might want to use subwords indeed.
Dear all,
I further trained (retrained) a model with the same training sets but different vocabulary sets:
Like this:
(X + Y) Vocabulary_set_1
(X + Y) Vocabulary_set_2
That is vocabulary sets are different and were generated from different data sources using the onmt_build_vocab script.
After doing this, I got the exact same translation performance on both scenarios.
(i) Does this mean OpenNMT simply ignores the intended vocabulary sets and generates its own vocabulary set from training data or maybe using the vocabulary set of the first model (X). (ii) Are there any ways to update the vocabulary sets in OpenNMT? NB: There is a update_vocab parameter, however, I am not sure if it’s the right way to do so! that also requires reset_optim which its choices are not clear for me.
Does anyone experience retraining a model using a different vocabulary set? Do you have any idea that can walk me through the OpenNMT retraining procedure?
Thank you.