Export all token scores in CTranslate2

Guoli · April 8, 2021, 5:05am

Can we export each token score instead of cum log probability score from CTranslate2? Based on some experiment in other transformers models, a higher precision could be achieved by mean log probability score by tuning the threshold.

guillaumekln · April 8, 2021, 7:43am

I’m not sure to understand. If you need the mean log probability you could just divide the cumulative log probability by the number of tokens?

Guoli · April 8, 2021, 7:44am

Ah. That’s right. Another question is whether we have included the eos token probability score in the final cumulative probability score.

guillaumekln · April 8, 2021, 7:56am

Good question. I found that we are a bit inconsistent here because the EOS score is included when running beam search (beam_size > 1), but not when running greedy search (beam_size = 1).

Probably we should always include the EOS score since it is actually part of the generated sequence. Any thoughts?

Guoli · April 8, 2021, 8:00am

The different text generation application might have different performance on it. Would it be easier for users with an optional variable to return all token scores and let them determine the final score?

guillaumekln · April 9, 2021, 3:54pm

We can possibly return the token scores, but I’m not sure they are frequently used. In the meantime, I made sure that the returned score is consistent between the beam and greedy decodings:

Guoli · April 11, 2021, 2:23am

I think token scores would be useful to squeeze out the best performance when working with score threshold. It turns out a big difference when we play with mean or cumulative score. Currently both fariseq and huggingface can export each token score and one aggregated score.

guillaumekln · April 11, 2021, 7:29am

What exactly do you mean by score threshold? Can you refer to a paper?

Guoli · April 11, 2021, 10:29am

In general, the larger the log probability score is, the higher precision the model can achieve. Just like the precision and recall plot in binary classification.

Zaowad · March 11, 2024, 8:26pm

I am curious about the same stuff. Anyway we could possibly have scores for every token from ctranslate2? That would really help