Why does learning rate change?

When I train my model, the learning rate increases at a constant rate until 0.001, then decreases at a constant rate afterwards. Why is this? Why isn’t the learning rate set to be constant?

My config is:

model_dir: my_model

data:
  train_features_file: source_train.sp
  train_labels_file: target_train.sp
  eval_features_file: source_val.sp
  eval_labels_file: target_val.sp
  source_vocabulary: final.vocab
  target_vocabulary: final.vocab

train:
  save_checkpoints_steps: 1000

eval:
  external_evaluators: BLEU

infer:
  batch_size: 32

The command I use for training is:
onmt-main --model_type Transformer --config config.yml --auto_config train --with_eval

You are using the Transformer configuration. See section 5.3 of the paper:

Ah, I see. Thank you

At what stage of the learning rate would I want to switch to my domain-specific corpus for transfer learning? My parent corpus is 2.8M, and my domain specific has 40K. I was going to switch over when the learning rate is decreasing and around 0.0002, but I really have no idea how to choose when. For reference, the learning rate peaks at 0.001 (edited my original post to reflect this).

I think it’s easier to reason in number of training steps. Maybe start with 100k training steps on generic data before starting the adaptation. As you gain more experience, you will probably find a better value.

If I wait until 100K steps, the learning rate will have decreased to the minimum (0.000001). Is it okay to start training the domain-specific corpus at the minimum learning rate?

Here is the learning value that you get after 100k steps with the Transformer configuration:

>>> opennmt.schedules.NoamDecay(2.0, 512, 8000)(100000)
<tf.Tensor: shape=(), dtype=float32, numpy=0.00027950708> 

By changing the learning rate, the training procedure is doing a kind of “simulated annealing”:


:wink: