Length and Coverage Normalization - Deciding on parameters

emresatir · March 24, 2021, 11:00am

Hi there,

I want to use length and coverage normalization to get better results while decoding. I’ve decided to use Wu (GNMT) approaches. Bu i don’t know how to decide parameter values for alpha and beta.

In their paper they say that they optimized them on a development set. Is there any specific fix values for them or do i have to find the best values for them for my dataset empirically (and how)?

Thanks in advance.

francoishernandez · March 24, 2021, 11:21am

There is no fixed rule, you’ll have to tune this to your task. Some starting values are mentioned in the Transformer paper (if you’re using this architecture).