Source and target word correspondence in Opennmt-py

fatizori · May 2, 2019, 9:44am

hi,
I want to know how to find for a source word its correspondent target word in translation tokens in OpenNMT-py

guillaumekln · May 6, 2019, 8:20am

Hi,

Maybe look at the -attn_debug translation option that prints attention information of each target words.

fatizori · May 7, 2019, 12:44pm

Thank you , I found it

ajitesh3 · June 9, 2020, 8:08am

@guillaumekln
I have one doubt regarding attention debug which I am using through ctranslate module.
I am doing the below translation english to hindi:
src: He is a learned counsel from India.
tgt: वह भारत के विद्वत अधिवक्ता हैं।
here the translation of learned counsel is विद्वत अधिवक्ता
input_subwords: ["[‘▁He’,","‘▁is’,","‘▁a’,","‘▁learned’,","‘▁counsel’,","‘▁from’,","‘▁India’,","‘▁.’]"]
output_subwords: ["[‘▁वह’,","‘▁भारत’,","‘▁के’,","‘▁विद्वत’,","‘▁अधिवक्ता’,","‘▁हैं’,","‘▁।’]"]

attention vector:
[[0.0058057308197021484,0.01458070520311594,0.0022700950503349304,0.0020472274627536535,0.002665645442903042,0.01835373416543007,0.06199241057038307,0.8922834992408752],[0.0031804030295461416,0.029839416965842247,0.009388826787471771,0.006563771516084671,0.006357409525662661,0.07426086813211441,0.32618066668510437,0.544228196144104],[0.0031960716005414724,0.08972111344337463,0.03608512878417969,0.012650595046579838,0.015241133980453014,0.04158253222703934,0.11371056735515594,0.6878121495246887],[0.0009559992467984557,0.04551956057548523,0.06536999344825745,0.07045494765043259,0.020516524091362953,0.032580431550741196,0.14256735146045685,0.6220346093177795],[0.00008994136442197487,0.04222024977207184,0.007950904779136181,0.004777112044394016,0.007352378685027361,0.003796257544308901,0.00408315472304821,0.929729163646698],[0.024851886555552483,0.08670078963041306,0.0056863161735236645,0.004578482359647751,0.012905516661703587,0.029562558978796005,0.013309900648891926,0.8224037885665894],[0.015281843952834606,0.03236306458711624,0.006210814695805311,0.004451880697160959,0.009255962446331978,0.014088182710111141,0.01956547610461712,0.8987818956375122]]

if I see the attention vector for 3rd target subword ‘▁विद्वत’,
[0.0009559992467984557,0.04551956057548523,0.06536999344825745,0.07045494765043259,0.020516524091362953,0.032580431550741196,0.14256735146045685,0.6220346093177795]

the maximum value is at 4th position(i=3) which refers to src token ‘▁learned’, which is absolutely correct.
But when I look at attenton scores for next target subword ‘▁अधिवक्ता’,
[0.00008994136442197487,0.04222024977207184,0.007950904779136181,0.004777112044394016,0.007352378685027361,0.003796257544308901,0.00408315472304821,0.929729163646698]

I should ideally get the max attention score at position 5(i=4) which should refer to ‘▁counsel’,
however this is not happening and for 5th target token(subword) the maximum attention score for src token refers to ‘▁.’] and after that this ‘▁is’,

Kindly clarify my doubt.I am using transformer based NMT system.

guillaumekln · June 9, 2020, 8:13am

Did you train your Transformer model with guided alignment? If not, the attention probabilities usually can not be used as alignment.

ajitesh3 · June 9, 2020, 10:29am

@guillaumekln
could you please elaborate this point about guided alignement. How to achieve that?
And my aim is to have this alignment.

guillaumekln · June 9, 2020, 11:00am

See for example:

ajitesh3 · June 10, 2020, 10:22am

@guillaumekln
I see that has been implemented in OpenNMT-py
do we have instructions on how to achieve this in OpenNmt

guillaumekln · June 10, 2020, 10:27am

Do you mean OpenNMT-tf? If yes, this is documented here: https://opennmt.net/OpenNMT-tf/alignments.html

ajitesh3 · June 11, 2020, 6:57am

for OpenNmt-py only

guillaumekln · June 11, 2020, 8:45am

You should look at the pull request linked above: https://github.com/OpenNMT/OpenNMT-py/pull/1615

francoishernandez · June 11, 2020, 10:01am

There also is an entry in the FAQ: https://opennmt.net/OpenNMT-py/FAQ.html#can-i-get-word-alignment-while-translating

ajitesh3 · July 15, 2020, 10:54am

@francoishernandez @guillaumekln
is it a good idea to train your transformer with fast align or giza as suggested in FAQ. The point is nowadays everyone uses subword based model. These will give alignment for subwords which may not exactlty make sense at word level.
Also are these fast align and giza++ language agnostic?? will it support all languages pair alignment?

francoishernandez · July 15, 2020, 12:17pm

For subwords, see my reply here, and the paper mentioned: Best way to handle emojis during translation

Not sure it’s 100% language agnostic, as the results may vary depending on the language pair (and you may need to apply some preprocessing, tokenization for instance, to your data), but it should work for most pairs (https://www.aclweb.org/anthology/N13-1073.pdf).