Visual error in architecture or not being applied - sparsemax

I wanted to conduct some experiments with sparsemax vs. softmax, but I’m not completely sure if “sparsemax” is fully replacing “softmax” throughout the model. In config I set global_attention_function: "sparsemax" but in the printed output of the architecure when initating training, the Model Architecture says it is still using Softmax, and doesn not mention sparsemax anywhere.
Is this just a visual error or is there something I am just not accounting for/thinking of?


I’m on OpenNMT-py v3.0.3