Word repetition with Transformers


When translating one word segments Transformer model returns its transaltion repeated multiple times. Is there any way to prevent this? It seems it has no enough information about how to translate these single words.


Maybe you could add some terminology with short segments in your training data.
A patch at decoding could be the block_ngram_repeat flag.

Would this flag always prevent the model from repeating words?

That’s the idea yes, hence ‘patch’ and not a true solution… :slight_smile:

Perfect! I guess it is also implemented in the server, isn’t it?

Which should be an acceptable value with BPE? I’ve seen it is the size of the ngram repeated
For example:

Hello --> Bon @@jour bon @@jour (Bonjour bonjour)

I think you’d have to experiment to find a value fit to your task. You can probably start around 3.

Ok, the results with that flag are not what I expected as it changes the translation to avoid repetition, resulting in another useless translation without repetition.

I’ll surely need to add single word segments and short segments to the training data.


I face the same problem. Could you please explain more how to “add some terminology with short segments in your training data” and how this way helps?
Thank you.

If the model only learns to translate long sentences, it might struggle translating short ones. Then, you might want to add some data more representative to what you want to translate, hence short segments.

Thanks to your answer, now I better understand the idea.
To make clear, could you please give me an example?

You may have
I like apple juice. // J'aime le jus de pommes.
in your parallel data.
But that may not be enough for the model to learn to translate for instance
juice // jus
and mostly, it might struggle generating very short sequences if it has not learnt to do so.

I like apple juice. // J’aime le jus de pommes
For this case, please give me an example of adding short segments to improve the translation?
Thanks a lot!

I just did in my original reply:
juice// jus

That’s what we call ‘terminology’, / , like in a dictionary if you prefer.

Thank you so much. Your explaination does help me a lot.