Hi all,
Just out of curiosity:
Is there a specific reason why LabelSmoothingLoss is applied only during training and not during validation? Wouldn’t the comparison between training loss and validation loss be logical if both steps used the same loss function?
Thanks,
Arbin
             
            
              
              
              
            
            
                
                
              
           
          
            
            
              Hi,
Label smoothing is a training trick. It makes little sense to apply it for validation where you want the log likelihood of the true target.
To improve loss comparison, it’s possible to compute both the standard loss and the smoothed loss and use the first for reporting and the second for computing the gradients.