Length Penalty Question


I have some question.

I want to know exactly what the average length penalty and coverage length penalty are.



You can find details in this Google paper, section 7:

What is the difference between the Length Penalty (wu,proposed in GNMT paper ) and the average length penalty.

The average length penalty simply divides scores by the length.


I have one more question.
what is the different between average penalty and coverage penalty.

I think the difference is made clear in the Google paper? Length penalty normalizes the scores based on the sequence length. Coverage penalty uses the attention weights to penalize hypotheses that don’t fully cover the source sequence.

