OpenNMT

Implementing Boosting Techniques

Hi,

This paper says it used OpenNMT to implement ‘boosting’ techniques to improve the performance of NMT systems, such as removing some of the easiest sentences each epoch/training step based on sentence perplexity.

These are the full details on the techniques they used:

image

How could these techniques be implemented via OpenNMT?

Thanks,

Albert

Hi,

If using OpenNMT-tf, you could:

  1. Train for one epoch
  2. Compute the score of each example in the original training dataset
  3. Apply the filtering or augmentation strategy of your choice based on the score and produce the data for the next epoch
  4. Go back to 1.

I would have added an extra step between step 2 and 3:

Calculate BLEU score between training data and the training data translated by the model. Filter out those with really poor BLEU score and with a predict score high. These cases are most likely wrong translation/ incorrect alignment.

1 Like

@guillaumekln, would it make sense to do the same thing you propose, but use the 10% worse predict score instead of the BLEU score?

Sure, I did not mention the BLEU score in my response.

My bad! You’r right, I don’t know why I assumed you mentioned BLEU score…! Sorry about that.

Thanks. For the scoring, is there a way to output the translations and their scores instead of just seeing the results on the console?

You could just redirect the output to a file:

COMMAND > output.txt

sorry for all the questions but how would you code that exactly? i.e. if I had this line to score some translations:

!onmt-main --config mft_pt2_40000_v2_preds.yml --auto_config score --features_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_clean_bpe.txt --predictions_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_preds.txt

where would I add the ‘command’ bit?

Your line is the COMMAND.

!onmt-main --config mft_pt2_40000_v2_preds.yml --auto_config score --features_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_clean_bpe.txt --predictions_file drive/MyDrive/Dissertation/NHS scraped sentences/nhs_scraped_preds.txt > output.txt