Network configuration & learning behaviour

In my example I use a made-up name “Golly Waterkoski” which does not occur in the training data. My source text "Golly Waterkoski wordt voorzitter (in Dutch) is rendered in my prediction as “Golly Waterkoski becomes chairman”. However, if I backtranslate the prediction into Dutch, I get “. Golly wordt voorzitter”.
I unfortunately did not keep a note of the two network configurations (which I know were different). Would that difference in network configuration be likely to explain why one model has learned this pattern for a proper name and the other has not?

Does this proper name is an unknown word for your models?
This is, if you translate the sentences without the ‘-replace_unk’ option,
does it appears an '<unk>' label? How many ‘<unk>’ do they generate?

Maybe this difference has to do with the alignments learned by the attention mechanism for each model, in the first one it can recover the entire name but in the second it only can recover only the first name.
Another explanation is that the second model just generates a translation in the form '<unk> wordt voorzitter’, so it is expectable that it only introduce one word

Have you tried other instance of your models from other epochs? Does it repeats this behaviour thorough the translations in general or is it a particular “isolated” example?

Maybe at other point of the training -with a model from other epoch- it can translate this sentence better.
If it is a general error you can try to continue the training of your model using sentence examples of this kind (Name1 Name2 verb object) to refine the behaviour of your model.

Hi Eva,
Thanks these are interesting points. Our “test machine” with the GPU is busy at the moment but I will try these suggestions out when it becomes free. I deliberately chose imaginary names which are NOT in the training data and therefore not in the src or tgt vocabularies. In your experience how far back in the training (back to which epoch) is one likely to find a “better” translation or a better handling of an unknown name?

Hi Terence,

I cannot tell you an exact moment, it depends on the training setting and mostly on the training set.

I would try out using the (5?) models with the less perplexity on the validation set -which are expected to be the ones that generalize the most- and see what happens :slight_smile: