Bad data in WMT14 en-de

Anybody noticed that sentences 8-49 in WMT14’s train.de are in English?
And unrelated to the corresponding sentences in train.en?

Can anyone else confirm that?

Is that to build in robustness to bad training?
A mistake?
Anyone checked the rest of it?

  5  die Mitteilungen sollen den geschäftlichen kommerziellen Charakter tragen.
  6  der Vertrieb Ihrer Waren und Dienstleistungen durch das Postfach-System WIRD NICHT ZUGELASSEN.
  7  die Werbeversande (Spam) und andere unkorrekte Informationen werden gelöscht.
  8  ACDSee 9 Photo Manager Organize your photos. Share your world.
  9  No matter what kind of photos you take - of friends and family or artistic shots as a hobby - you need photo software that organizes your shots AND allows you to view, fix        , and share them quickly and easily.
 10  ACDSee 9 makes organizing your photos exactly that: Quick and easy, so you can play with and share the great photos you've got...
 11  Your photo collection is growing daily. Family pictures, travel pictures, pictures of your home and garden - with so many photos to look through, how will you find and org        anize your best ones?

I think you won’t ever find a dataset without any issue, of this kind or another.

1 Like