If I get new training data for an already pretrained engine, I understand that one can perform incremental training by passing the pretrained model as arugment to the
In doing so, should the procedure be different if the new training data consists of just a few parallel segments or 1k, or 100k? Should one mix the new training data with a part of the old one if the new data only consists of a few parallel segments?
Also, is there a way to add the new words found in the new training data to the vocabulary? The documentation specifies that when training from an existing model, the vocabularies cannot be changed, however as far as I understand the vocabulary plays a role only in the word embedding matrix and not in the actual topology of the network.