Is there a way to score the validation set against multiple references?

mmartin9684 · July 10, 2020, 1:09pm

I have several valid translations for each of my source sentences, and am training the model by pairing each source sentence with each translation in the train_features_file and train_labels_file:

Features File — Labels File
source-sentence-1 — target-sentence-1-ref1
source-sentence-1 — target-sentence-1-ref2
source-sentence-2 — target-sentence-2-ref1
source-sentence-2 — target-sentence-2-ref2
…
The eval_features_file and eval_labels_file are set up the same way.

However, when generating a BLEU score during validation using this configuration, the scorer scores the predictions against each translation and retains all of these scores, rather than just using the best score per prediction.

Is there a better way to configuration eval_features_file and eval_labels_file for multi-reference scoring?

Bachstelze · July 12, 2020, 8:33pm

Heyho Michael A. Martin,
You can score the translation generation with NLTK against multiple references. That is not a way to configure the evaluation, but we could write a validation script with it.

Greetings from the translation space

guillaumekln · July 15, 2020, 8:14am

Multi-reference scoring is not supported in OpenNMT-tf.