hi,
I want to know how to find for a source word its correspondent target word in translation tokens in OpenNMT-py
Hi,
Maybe look at the -attn_debug
translation option that prints attention information of each target words.
Thank you , I found it
@guillaumekln
I have one doubt regarding attention debug which I am using through ctranslate module.
I am doing the below translation english to hindi:
src: He is a learned counsel from India.
tgt: वह भारत के विद्वत अधिवक्ता हैं।
here the translation of learned counsel is विद्वत अधिवक्ता
input_subwords: ["[‘▁He’,","‘▁is’,","‘▁a’,","‘▁learned’,","‘▁counsel’,","‘▁from’,","‘▁India’,","‘▁.’]"]
output_subwords: ["[‘▁वह’,","‘▁भारत’,","‘▁के’,","‘▁विद्वत’,","‘▁अधिवक्ता’,","‘▁हैं’,","‘▁।’]"]
attention vector:
[[0.0058057308197021484,0.01458070520311594,0.0022700950503349304,0.0020472274627536535,0.002665645442903042,0.01835373416543007,0.06199241057038307,0.8922834992408752],[0.0031804030295461416,0.029839416965842247,0.009388826787471771,0.006563771516084671,0.006357409525662661,0.07426086813211441,0.32618066668510437,0.544228196144104],[0.0031960716005414724,0.08972111344337463,0.03608512878417969,0.012650595046579838,0.015241133980453014,0.04158253222703934,0.11371056735515594,0.6878121495246887],[0.0009559992467984557,0.04551956057548523,0.06536999344825745,0.07045494765043259,0.020516524091362953,0.032580431550741196,0.14256735146045685,0.6220346093177795],[0.00008994136442197487,0.04222024977207184,0.007950904779136181,0.004777112044394016,0.007352378685027361,0.003796257544308901,0.00408315472304821,0.929729163646698],[0.024851886555552483,0.08670078963041306,0.0056863161735236645,0.004578482359647751,0.012905516661703587,0.029562558978796005,0.013309900648891926,0.8224037885665894],[0.015281843952834606,0.03236306458711624,0.006210814695805311,0.004451880697160959,0.009255962446331978,0.014088182710111141,0.01956547610461712,0.8987818956375122]]
if I see the attention vector for 3rd target subword ‘▁विद्वत’,
[0.0009559992467984557,0.04551956057548523,0.06536999344825745,0.07045494765043259,0.020516524091362953,0.032580431550741196,0.14256735146045685,0.6220346093177795]
the maximum value is at 4th position(i=3) which refers to src token ‘▁learned’, which is absolutely correct.
But when I look at attenton scores for next target subword ‘▁अधिवक्ता’,
[0.00008994136442197487,0.04222024977207184,0.007950904779136181,0.004777112044394016,0.007352378685027361,0.003796257544308901,0.00408315472304821,0.929729163646698]
I should ideally get the max attention score at position 5(i=4) which should refer to ‘▁counsel’,
however this is not happening and for 5th target token(subword) the maximum attention score for src token refers to ‘▁.’] and after that this ‘▁is’,
Kindly clarify my doubt.I am using transformer based NMT system.
Did you train your Transformer model with guided alignment? If not, the attention probabilities usually can not be used as alignment.
@guillaumekln
could you please elaborate this point about guided alignement. How to achieve that?
And my aim is to have this alignment.
See for example:
@guillaumekln
I see that has been implemented in OpenNMT-py
do we have instructions on how to achieve this in OpenNmt
Do you mean OpenNMT-tf? If yes, this is documented here: https://opennmt.net/OpenNMT-tf/alignments.html
for OpenNmt-py only
There also is an entry in the FAQ: https://opennmt.net/OpenNMT-py/FAQ.html#can-i-get-word-alignment-while-translating
@francoishernandez @guillaumekln
is it a good idea to train your transformer with fast align or giza as suggested in FAQ. The point is nowadays everyone uses subword based model. These will give alignment for subwords which may not exactlty make sense at word level.
Also are these fast align and giza++ language agnostic?? will it support all languages pair alignment?
For subwords, see my reply here, and the paper mentioned: Best way to handle emojis during translation
Not sure it’s 100% language agnostic, as the results may vary depending on the language pair (and you may need to apply some preprocessing, tokenization for instance, to your data), but it should work for most pairs (https://www.aclweb.org/anthology/N13-1073.pdf).