Is it recommended to scale up the loss of a character level seq2seq model by a suitable factor (suppose 7: an English word has 7 characters on an average) to help with better learning especially when the sentences are quite long (max_token_size=256)?
And if that’s the case, is there a way to pass a scaling factor as an argument to train.py?