Can I get some metrics of translation confidence for each translated words?
translate.py has “-verbose” and it prints “pred_score”, but for the whole sentence.
But I want to be able to highlight some words that has low confidence score (or any other metrics).
I know that during decoding system calculates chance for every second token, so it has to be possible, I think.
The per-word scores are not saved. You would need to edit the decoding code to return and print them.
You could score the words with an external language model which is useful for noisy channel modeling, but it is not the score of your translation model.