Phrase tables and <unk> replace


#1

Hi,

I have a translation with one <unk> tag. The translation is not the best, but it doesn’t matter in this case.

Source: Vergangene Woche trafen sich Tony Blair, Jacques Chirac und Gerhard Schröder in Berlin.
Translation: Tony <unk> Jacques Chirac and Gerhard Schroeder met President Gerhard Chirac

As I understand I build a phrase table text file like this
Blair|||Blair

My goal was to replace <unk> with Blair.
My command is: th translate.lua -phrase_table phrase.txt -replace_unk true -model Testdaten/news-all-de-en-train_epoch8_6.34.t7 -src Testdaten/MyText.txt -output pred.txt

Translation: Tony Schröder Jacques Chirac and Gerhard Schroeder met President Gerhard Chirac

<unk> is replaced with Schröder and not with Blair. Have I made any mistake or is it not possible?

Norbert


(Guillaume Klein) #2

Hi,

The feature worked as expected: the <unk> tag was replaced by a source token. However, the replacement uses the attention vector of the model which can produce wrong alignments. This can be improved by training a bigger model and/or use more data.


#3

Thank you, I had some hope that it works. In SMT and Moses I have used phrase tables successfully.

Norbert