Thanks! The SentencePiece model worked fine.
FYI, The link to the preparation script for IWSLT14 does not work in the table.
I ran the preparation script on the IWSLT14 data and then I also ran the Moses scripts on my input data as:
perl tokenizer.perl -l de -threads 8 < test_de.txt > /tmp/tmp
perl lowercase.perl < /tmp/tmp > original_text_tokenized.txt
Unfortunately got unsatisfactory results on my data:
[2018-06-27 12:13:12,434 INFO]
SENT 2: ('sehr', 'geehrte', 'damen', 'und', 'herren', '!', 'bei', 'einem', 'lieferantenbesuch', 'in', 'china', 'habe', 'ich', 'von', 'einem', 'geschäftspartner', 'im', 'rahmen', 'einer', 'abendveranstaltung', 'drei', 'flaschen', 'rotwein', 'im', 'wert', 'von', 'ca.', 'jew', '.', '30.-', '€', 'geschenkt', 'bekommen', '.', 'diese', 'habe', 'ich', 'aus', 'gründen', 'der', 'wertschätzung', 'und', 'höflichkeit', 'angenommen', 'meine', 'frage', 'ist', 'nun', ':', 'wie', 'muss', 'ich', 'mich', 'nun', 'weiter', 'verhalten', '?', 'kann', 'ich', 'die', 'einzelnen', 'flaschen', 'meinen', 'mitarbeitern', 'als', 'weihnachtsgeschenk', 'überreichen', '?', 'vielen', 'dank', 'für', 'ihre', 'auskunft', '.', 'mit', 'freundlichen', 'grüßen')
PRED 2: ladies and gentlemen , ladies and gentlemen , in china , i have a business business in china in the rahmen of three bottles of red wine in the wert of about jew . and so , for reasons that i 've been given the appreciation and höflichkeit , i 've got my question now : how am i going to have to behave ? can i have the single bottles of my staff as a weihnachtsgeschenk person ? thank you very much .
PRED SCORE: -75.9947
I tried applying it to the preprocessed IWSLT data that were created by the script you linked to see if it is out-of-domain problem and it is still not good but a bit better:
[2018-06-27 12:26:26,685 INFO]
SENT 7: (‘die’, ‘erste’, ‘dieser’, ‘fallen’, ‘ist’, ‘ein’, ‘widerstreben’, ‘,’, ‘komplexität’, ‘zuzugeben’, ‘.’)
PRED 7: the first of these fall is a widerstreben , komplexität .
PRED SCORE: -7.6431
[2018-06-27 12:26:26,685 INFO]
SENT 9: (‘ich’, ‘denke’, ‘,’, ‘es’, ‘gibt’, ‘eine’, ‘bestimmte’, ‘bedeutung’, ‘,’, ‘auf’, ‘die’, ‘wir’, ‘es’, ‘beschränken’, ‘könnten’, ‘,’, ‘aber’, ‘im’, ‘großen’, ‘und’, ‘ganzen’, ‘ist’, ‘das’, ‘etwas’, ‘,’, ‘das’, ‘wir’, ‘aufgeben’, ‘werden’, ‘müssen’, ‘,’, ‘und’, ‘wir’, ‘werden’, ‘die’, ‘komplizierte’, ‘sichtweise’, ‘annehmen’, ‘müssen’, ‘darauf’, ‘,’, ‘was’, ‘wohlbefinden’, ‘ist’, ‘.’)
PRED 9: i think there 's a certain meaning that we could keep it on , but in the big , and all of this is something that we need to give up , and we will have the complicated view of what is well-being .
PRED SCORE: -23.9721
Can it be that this is just how the model is? Is it the best it gets? It seems poor for IWSLT14.