What would be the effect of disabling bucketing ?
I guess it would make training slower, since every sentence would have more tokens because of padding
But would it lower performance in term of BLEU score for example ?
Also, am I correct assuming that each sentence will be padded until it reaches the max length of its batch ?
Answer for this question
After testing, each sentence is padded accordingly to the max length of its batch. Also it is independant for target and source