Length and Coverage Normalization - Deciding on parameters

Hi there,

I want to use length and coverage normalization to get better results while decoding. I’ve decided to use Wu (GNMT) approaches. Bu i don’t know how to decide parameter values for alpha and beta.

In their paper they say that they optimized them on a development set. Is there any specific fix values for them or do i have to find the best values for them for my dataset empirically (and how)?

Thanks in advance.

There is no fixed rule, you’ll have to tune this to your task. Some starting values are mentioned in the Transformer paper (if you’re using this architecture).