OpenNMT Forum

Using Translate on PreTrained Model

I am a new user and trying to use the provided language translation model as follows:

onmt_translate -model /Downloads/iwslt-brnn2.s131_acc_62.71_ppl_7.74_e20.pt -src data/src-test.txt -output preds456.txt -verbose

The performance seems very poor on the provided data set. Here is an example:

[2020-08-30 17:04:05,878 INFO]
SENT 10: [‘Jet’, ‘makers’, ‘feud’, ‘over’, ‘seat’, ‘width’, ‘with’, ‘big’, ‘orders’, ‘at’, ‘stake’]
PRED 10: with big
PRED SCORE: -4.8279

What am I missing?

Thanks!
Stewart

This model expects a German sentence as input.

You should also apply the same tokenization that was used for the training data. See:

Thanks so much for the help!

In the documentation (https://github.com/OpenNMT/OpenNMT-py), it looks like they run translate on the raw text file, as in.

onmt_translate -model demo-model_acc_XX.XX_ppl_XXX.XX_eX.pt -src data/src-test.txt* -output pred.txt -replace_unk -verbose

Are you saying that the file data/src-test.txt cannot just be raw German text, but must be turned into list of tokens or something?

data/src-test.txt is already tokenized (see for example the space before the periods).