RuntimeError: One input stream has less examples than the others

bnicholl · April 30, 2024, 7:21pm

When I call
translations_subworded = translator.translate_batch(source_sents_subworded_[0:4], batch_type="tokens", max_batch_size=2024, beam_size=4, target_prefix=target_prefix, max_input_length = 200, max_decoding_length=200)

with

source_sents_subworded_[0:4]

object source_sents_subworded_[0:4] below

[['spa_Latn',
  '▁El',
  '▁erot',
  'ismo',
  '▁liter',
  'ario',
  '▁no',
  '▁tiene',
  '▁l',
  'Ã',
  'ƒ',
  'Â',
  '\xad',
  'mit',
  'es',
  '.',
  '▁La',
  '▁mente',
  '▁humana',
  '▁no',
  '▁tiene',
  '▁l',
  'Ã',
  'ƒ',
  'Â',
  '\xad',
  'mit',
  'es',
  '▁para',
  '▁dise',
  'Ã',
  'ƒ',
  'Â',
  '±',
  'ar',
  '▁contenido',
  '▁er',
  'Ã',
  'ƒ',
  'Â',
  '3',
  'tico',
  '.',
  '▁Ã',
  '‚',
  'Â',
  '¡',
  'H',
  'ola',
  '!',
  '▁Soy',
  '</s>'],
 ['spa_Latn',
  '▁Daniel',
  '▁R',
  '.',
  '▁Vil',
  'lar',
  'real',
  ';',
  '▁m',
  'Ã',
  'ƒ',
  'Â',
  '¡',
  's',
  '▁conocido',
  '▁como',
  '▁Daniel',
  '▁Star',
  'c',
  'row',
  ';',
  '▁soy',
  '▁escritor',
  '▁fantas',
  'ma',
  ',',
  '▁este',
  '▁es',
  '▁mi',
  '▁primer',
  '▁v',
  'Ã',
  'ƒ',
  'Â',
  '\xad',
  'deo',
  '.',
  '▁Aunque',
  '▁no',
  '▁soy',
  '▁as',
  'idu',
  'o',
  '▁a',
  '▁este',
  '▁tipo',
  '▁de',
  '▁cosas',
  '▁si',
  '▁quieres',
  '▁m',
  '</s>'],
 ['spa_Latn',
  'Ã',
  'ƒ',
  'Â',
  '¡',
  's',
  '▁v',
  'Ã',
  'ƒ',
  'Â',
  '\xad',
  'deos',
  '▁as',
  'Ã',
  'ƒ',
  'Â',
  '\xad',
  '▁(',
  'Sin',
  '▁promo',
  '▁de',
  '▁mis',
  '▁libros',
  ')',
  '▁solo',
  '▁tienes',
  '▁que',
  '▁darle',
  '▁like',
  ';',
  '▁puedes',
  '▁seguir',
  'me',
  '▁si',
  '▁quieres',
  ',',
  '▁ahora',
  '▁se',
  '▁supone',
  '▁que',
  '▁ten',
  'Ã',
  'ƒ',
  'Â',
  '\xad',
  'a',
  '▁que',
  '▁decir',
  '▁que',
  '▁le',
  '▁des',
  '</s>'],
 ['spa_Latn',
  '▁a',
  '▁la',
  '▁campan',
  'ita',
  '▁pero',
  '▁verd',
  'ader',
  'amente',
  '▁me',
  '▁es',
  '▁indifer',
  'ente',
  '▁si',
  '▁le',
  '▁das',
  '▁o',
  '▁no',
  '.',
  '▁gracias',
  '▁',
  ':)',
  '▁P',
  'ongo',
  '▁el',
  '▁v',
  'Ã',
  'ƒ',
  'Â',
  '\xad',
  'deo',
  '▁con',
  '▁restric',
  'ci',
  'Ã',
  'ƒ',
  'Â',
  '3',
  'n',
  '▁de',
  '▁edad',
  '▁ya',
  '▁que',
  '▁cuando',
  '▁ha',
  'ble',
  '▁de',
  '▁estas',
  '▁cosas',
  '▁no',
  '▁quiero',
  '</s>']]

I get this error

RuntimeError: One input stream has less examples than the others

But if I call the above function with any variation of three lists it works. so for example

translations_subworded = translator.translate_batch(source_sents_subworded_[1:4], batch_type="tokens", max_batch_size=2024, beam_size=4, target_prefix=target_prefix, max_input_length = 200, max_decoding_length=200)

or

translations_subworded = translator.translate_batch(source_sents_subworded_[0:3], batch_type="tokens", max_batch_size=2024, beam_size=4, target_prefix=target_prefix, max_input_length = 200, max_decoding_length=200)

works fine

Any idea what could be happening