I am a new user and trying to use the provided language translation model as follows:
onmt_translate -model /Downloads/iwslt-brnn2.s131_acc_62.71_ppl_7.74_e20.pt -src data/src-test.txt -output preds456.txt -verbose
The performance seems very poor on the provided data set. Here is an example:
[2020-08-30 17:04:05,878 INFO]
SENT 10: [‘Jet’, ‘makers’, ‘feud’, ‘over’, ‘seat’, ‘width’, ‘with’, ‘big’, ‘orders’, ‘at’, ‘stake’]
PRED 10: with big
PRED SCORE: -4.8279
What am I missing?
This model expects a German sentence as input.
You should also apply the same tokenization that was used for the training data. See:
Thanks so much for the help!
In the documentation (https://github.com/OpenNMT/OpenNMT-py), it looks like they run translate on the raw text file, as in.
onmt_translate -model demo-model_acc_XX.XX_ppl_XXX.XX_eX.pt -src data/src-test.txt* -output pred.txt -replace_unk -verbose
Are you saying that the file data/src-test.txt cannot just be raw German text, but must be turned into list of tokens or something?
data/src-test.txt is already tokenized (see for example the space before the periods).