Sentence length & translation quality

For one particular engine (Eng2Dutch) I’ve been investigating the relationship between translation quality and the length of the input sentence. For sentences with up to 30 words I’ve been getting perfectly constructed Dutch sentences which express the meaning of the input sentence. With sentences longer than 30 words I see grammatical or semantic wobbles. I have used out-of-the-box settings for these tests.
Has anyone else experienced this?

Hello Terence,

you may need to try with a deeper network.
try first with 3 layers of 500 (vs 2 x 500 default) and 2 layers of 800.

let us know how it goes.

did you measure also with a BLEU score ? with both in-domain and external test sets ?



Well, trying with 4 x 600 brought BLEU up from 29.30 to 32.95 on in-domain material. I have not systematically done anything with external test sets yet (just throwing stuff into my client GUI). Clearly a promising avenue of investigation.

How many sentences are you measuring your BLEU on ?
What BLEU toolkit are yo using ?

Also are you using the default 50k vocab size and 50 sequence length ?

Bear in mind that with Dutch you may have to use BPE to avoid huge OOV at inference time.

This was a small test on 2398 sentences. I used defaults for vocab & sequence length. I am largely avoiding huge numbers of OOV by using the -phrase_table option as I have a 300K entry dictionary from an old Rule Based system.

To follow up, I have now used the 4x600 model in a live production setting to translate a complex legal document comprising some very long sentences from Dutch into English. For most translated sentences little or no post-editing was required; in only one sentence did the engine lose its way. This particular configuration seems to work well for this (in-domain) material.


"Bear in mind that with Dutch you may have to use BPE to avoid huge OOV at inference time."
Are you implying that use of BPE could deal with a term like “systeemontwikkelingsbeheerder” which would break down into “systeem ontwikkelings beheeder” (system development manager)? My old Rule Based System could accomplish that quite nicely, and it would be great to do with with Neural MT. The formation of compound nouns in Germanic languages can be quite arbitrary.

Hi Terence,

I don’t think BPE will help with Germanic compounding. I’ve been experimenting with character-based recognition using lots of hidden layers for language pairs involving at least one morphologically rich language. I’m not sure it works in all cases, but there is some promise, and I think it’s interesting from both the linguistic and neural side of things to see what kind of learning happens in the different hidden layers.

But to make a long story short, I think unless your system is learning the compounding from known examples, your best bet is post-processing the NMT output.

it’s not perfect but it DOES work.

I have some NL <=> EN/FR models working that way.

@dbl Hi David,
Keeping to the short story, I have scripts that can split these compounds during pre- or post-processing. In a translation project setting that’s doable.
@vince62s Based on what you report I’ll give this a try at my next training session.

@vince62s I’m intrigued… what BPE settings did you use?

pretty much the same as we did show here:


Hi David,
Following up on this issue I thought I’d mention that we eventually decided to implement compound splitting on the client side, i.e. before it is sent across to the server. For Dutch-English this works well by reference to a word list to avoid inappropriate splitting. I know this isn’t “pure NMT” but used in translation projects this technique certainly reduces the amount of post-processing. The splitter will clearly also work for other Germanic languages with the appropriate word list.

1 Like

With OpenNMT v9 I have had more success dealing with OOV’s by putting the dictionary look-up in a post-processing model that runs on the output from OpenNMT rather than through Attention at the end of inference.