I want to use length and coverage normalization to get better results while decoding. I’ve decided to use Wu (GNMT) approaches. Bu i don’t know how to decide parameter values for alpha and beta.
In their paper they say that they optimized them on a development set. Is there any specific fix values for them or do i have to find the best values for them for my dataset empirically (and how)?
Thanks in advance.