I have a source sentence in German and a target sentence in English and I’m using the pre-trained DE-EN model that comes with OpenNMT-py.
My objective is to generate attention matrix only from the tgt file, it’s not allowed to come up with other translations. I just want the (%) confidence.
src_file:
das daten ist vorbei .
tgt_file:
the date is over .
If I run translate.py, the beam search outputs:
das daten ist vorbei .
the 0.3053247 *0.4235602 0.0841463 0.0491322 0.1378366
data 0.0386287 *0.9214475 0.0232268 0.0106175 0.0060794
is 0.0516151 0.0754700 *0.4066976 0.3003595 0.1658577
over 0.0139861 0.0257822 0.0448820 *0.8648529 0.0504968
. 0.0078235 0.0125427 0.0771239 0.0835686 *0.8189413
</s> 0.0402967 0.0504099 0.0682272 0.0769914 *0.7640749
However if I add -tgt tgt_file
and print attn
in _score_target()
method (in /OpenNMT-py/onmt/translate/translator.py
) I would get:
tensor([[[0.3053, 0.4236, 0.0841, 0.0491, 0.1378]],
[[0.0386, 0.9214, 0.0232, 0.0106, 0.0061]], <- this row is "date"
[[0.0631, 0.0851, 0.4130, 0.2787, 0.1601]],
[[0.0165, 0.0332, 0.0391, 0.8633, 0.0480]],
[[0.0107, 0.0200, 0.0670, 0.0821, 0.8202]],
[[0.0448, 0.0610, 0.0644, 0.0758, 0.7540]]])
which is pretty similar to output from beam search. note that I’ve suggested “daten” to be translated as “date” instead of “data” but I’m still getting the same attention weights.
Now if I replace my target sentence with something totally wrong, eg, “the data is new .”, I’d still get a very similar attention matrix:
tensor([[[0.3053, 0.4236, 0.0841, 0.0491, 0.1378]],
[[0.0386, 0.9214, 0.0232, 0.0106, 0.0061]],
[[0.0516, 0.0755, 0.4067, 0.3004, 0.1659]],
[[0.0140, 0.0258, 0.0449, 0.8649, 0.0505]], <- this row is "new"
[[0.0206, 0.0637, 0.1246, 0.2603, 0.5308]],
[[0.0408, 0.0604, 0.0655, 0.0653, 0.7681]]])
What am I doing wrong here?