The difference between maximum_labels_length and maximum_iterations


I have noticed that there are two parameters controlling the length of decoding sequence and I am aware that maximum_labels_length controls the max decoder-side sequence during training.

But I am confused about maximum_iterations:

  1. Does it matter when training a model
  2. When doing inference using a trained model (say ckpt), can I modify it to a bigger value in order to make my model translate longer sentences?
  3. Does it interfere with maximum_labels_length?


  1. No.
  2. Yes.
  3. No but they are related as maximum_labels_length defines the longest target sequence the model will see during the training. Then during inference, the model could have issues producing a good sequence past this value.