Reset learning rate after update-vocabulary

Hi there,

I’m trying to reset the learning rate after updating my model with a new vocabulary (to do fine-tuning).
I’m executing these commands:

onmt-update-vocab --model_dir run_gen_bpe64mix_enfr/ --output_dir run_genind_bpe64mix_enfr/ --src_vocab gen_bpe64mix_enfr_en_vocab.txt --tgt_vocab gen_bpe64mix_enfr_fr_vocab.txt --new_src_vocab genind_bpe64mix_enfr_en_vocab.txt --new_tgt_vocab genind_bpe64mix_enfr_fr_vocab.txt --mode replace

And then:

onmt-main train_and_eval --model_type TransformerFP16 --auto_config --config config_genind_bpe64mix_enfr.yml

Being config_genind_bpe64mix_enfr.yml:

  eval_features_file: genind_bpe64mix_enfr_en_training_set_val.txt
  eval_labels_file: genind_bpe64mix_enfr_fr_training_set_val.txt
  source_words_vocabulary: genind_bpe64mix_enfr_en_vocab.txt
  target_words_vocabulary: genind_bpe64mix_enfr_fr_vocab.txt
  train_features_file: genind_bpe64mix_enfr_en_training_set_train.txt
  train_labels_file: genip_bpe64mix_enfr_fr_training_set_train.txt
  batch_size: 32
  eval_delay: 18000
  exporters: last
  batch_size: 32
  bucket_width: 5
model_dir: run_genind_bpe64mix_enfr/
  average_loss_in_time: true
  beam_width: 4
    model_dim: 512
    warmup_steps: 4000
  decay_type: noam_decay_v2
  label_smoothing: 0.1
  learning_rate: 2.0
  length_penalty: 0.6
  optimizer: LazyAdamOptimizer
    beta1: 0.9
    beta2: 0.998
  batch_size: 64
  average_last_checkpoints: 5
  batch_size: 3072
  batch_type: tokens
  bucket_width: 1
  effective_batch_size: 25000
  keep_checkpoint_max: 5
  maximum_features_length: 100
  maximum_labels_length: 100
  sample_buffer_size: -1
  save_summary_steps: 100
  train_steps: 500000

When the in-domain model starts training, the learning rate initial value equals the last value it had in the out-of-domain model.

Any help on how to reset an updated model with fresh hyperparameters?


The current approach is to:

  1. Change model_dir in the configuration file to a new directory (this will start a fresh training)
  2. Use --checkpoint_path on the command line to reference a trained checkpoint (this will load the trained weights)

Hi Guillaume,

Thanks so much for your help