Implementing Boosting Techniques

idfc_maldini · June 30, 2021, 10:53am

Hi,

This paper says it used OpenNMT to implement ‘boosting’ techniques to improve the performance of NMT systems, such as removing some of the easiest sentences each epoch/training step based on sentence perplexity.

These are the full details on the techniques they used:

How could these techniques be implemented via OpenNMT?

Thanks,

Albert

guillaumekln · June 30, 2021, 12:04pm

Hi,

If using OpenNMT-tf, you could:

Train for one epoch
Compute the score of each example in the original training dataset
Apply the filtering or augmentation strategy of your choice based on the score and produce the data for the next epoch
Go back to 1.

SamuelLacombe · June 30, 2021, 2:23pm

I would have added an extra step between step 2 and 3:

Calculate BLEU score between training data and the training data translated by the model. Filter out those with really poor BLEU score and with a predict score high. These cases are most likely wrong translation/ incorrect alignment.

SamuelLacombe · July 2, 2021, 9:20am

@guillaumekln, would it make sense to do the same thing you propose, but use the 10% worse predict score instead of the BLEU score?

guillaumekln · July 2, 2021, 9:22am

Sure, I did not mention the BLEU score in my response.

SamuelLacombe · July 2, 2021, 9:25am

My bad! You’r right, I don’t know why I assumed you mentioned BLEU score…! Sorry about that.

idfc_maldini · July 9, 2021, 10:48am

Thanks. For the scoring, is there a way to output the translations and their scores instead of just seeing the results on the console?

guillaumekln · July 9, 2021, 10:53am

You could just redirect the output to a file:

COMMAND > output.txt

idfc_maldini · July 28, 2021, 3:41pm

sorry for all the questions but how would you code that exactly? i.e. if I had this line to score some translations:

!onmt-main --config mft_pt2_40000_v2_preds.yml --auto_config score --features_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_clean_bpe.txt --predictions_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_preds.txt

where would I add the ‘command’ bit?

guillaumekln · July 28, 2021, 3:43pm

Your line is the COMMAND.

!onmt-main --config mft_pt2_40000_v2_preds.yml --auto_config score --features_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_clean_bpe.txt --predictions_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_preds.txt > output.txt