I’m trying to do incremental learning with pytorch version OpenNMT for English to Chinese translation. Although there are many questions asked regarding this topic, most of them are for the torch version or provided solution based on torch version.
Here is some basic info:
- Task: English to Chinese
- OpenNMT version: Pytorch
- Architecture: Transformer
- Generic data size: 10 million pairs
- In-domain data size: 4k pairs
- Granularity: Subword units with BPE on both sides
- Vocab size: 45571 for En, 32232 for Ch.
Here are the three scenarios for incremental learning:
- Retraining a pre-trained model on NEW data with SAME training options for in-domain adaptation.
- Retraining a pre-trained model on NEW data with DIFFERENT training options for in-domain adaptation.
- Continuing a stopped or complete training on SAME data with SAME training options for more epochs.
To my understanding, with torch version OpenNMT, which provides full Retraining options: -train_from, -continue, -update_vocab, addressing parameters of model, hyper-parameters of training, and vocabulary of model, respectively, I can do it in each scenario by using:
However, in pytorch version OpenNMT, there seems only
-train_from option. So how can I implement incremental learning in these three scenarios in OpenNMT-py?