Calculate TER metrics instead of using 'Levenshtein_distance'

atikaakmal · December 2, 2020, 10:55am

Hi eveyone,

I want to calculate TER metrics if give reference and hypothesis string. For this purpose, I read the following paper
"A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION".

In the paper, they just mentioned the pesudo code to calculate the number of ‘shift words’ and done insertion, deletion and subsitution of words by dynamic programing.
Currently, I am using ‘Levenshtein_distance at word’ but did not get expeted resultsi.e.(if i compare results that mentioned in the paper and Levenshtein_distance with similar data, (didnot get same output).

I would be highly thankful if somebody comment the solution to calculate the TER metrics according to this paper for example:
This paper considering the insertion, deletion, and substitution of single words as well as shifts of word sequences (if two adjecent words shift then it consider one word is shifted).
Also, All edits, including shifts of any number of words, any distance, have equal cost i.e. 1..

Thank you

guillaumekln · December 3, 2020, 8:17am

Hi,

Are you looking for implementations of the TER metric? Here is one in Java:

For Python, SacreBLEU can also compute TER: