I wanted to conduct some experiments with sparsemax vs. softmax, but I’m not completely sure if “sparsemax” is fully replacing “softmax” throughout the model. In config I set global_attention_function: "sparsemax"
but in the printed output of the architecure when initating training, the Model Architecture says it is still using Softmax, and doesn not mention sparsemax anywhere.
Is this just a visual error or is there something I am just not accounting for/thinking of?
eg
I’m on OpenNMT-py v3.0.3