For example when translating “How are you?” from English to German to have mapping that “How” is translated to “Wie” and etc. Here is the link to the research that describes this functionality.
Here is the photo from the research:
For example when translating “How are you?” from English to German to have mapping that “How” is translated to “Wie” and etc. Here is the link to the research that describes this functionality.
Here is the photo from the research:
The translation method can return the target->source attention weights with the flag return_attention
. For each target token, you could select the highest attention weight as the aligned source token.
However, Transformer models have multiple attention heads so there is no guarantee the returned attention can be interpreted as alignments.
Most training frameworks have an option to supervise the training of one or several attention heads. For example:
There are also similar options in Fairseq and Marian, and CTranslate2 correctly applies these options when converting the model.
Thanks for the fast response. Will check it.