Index Error when running inference

I am working on doing simple translation. Training works fine, However, when I run inference, I get the following error:

File "/data/envs/open_nmt/bin/onmt_translate", line 8, in <module>
    sys.exit(main())
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 60, in main
    translate(opt)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 41, in translate
    _, _ = translator._translate(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 345, in _translate
    batch_data = self.translate_batch(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 723, in translate_batch
    return self._translate_batch_with_strategy(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 813, in _translate_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/beam_search.py", line 295, in advance
    current_attn = attn.index_select(1, self.select_indices)
RuntimeError: INDICES element is out of DATA bounds, id=1 axis_dim=1

Any hints regarding what could be causing this error? I am running onmt_translate -config g2t.yaml on CPU. The g2t.yaml file looks as follows:

batch_type: tokens
replace_unk: true
beam_size: 4
model: ckpts/model_step_1000.pt
src: data/phoenix2014T.test.gloss
tgt: data/phoenix2014T.test.de
output: pred.txt
batch_size: 8

Removing replace_unk: true gives the following error instead:

Traceback (most recent call last):
  File "/data/envs/open_nmt/bin/onmt_translate", line 8, in <module>
    sys.exit(main())
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 60, in main
    translate(opt)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 41, in translate
    _, _ = translator._translate(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 345, in _translate
    batch_data = self.translate_batch(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 723, in translate_batch
    return self._translate_batch_with_strategy(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 772, in _translate_batch_with_strategy
    gold_score = self._gold_score(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 292, in _gold_score
    gs = self._score_target(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 857, in _score_target
    log_probs, attn = self._decode_and_generate(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 539, in _decode_and_generate
    dec_out, dec_attn = self.model.decoder(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/decoders/transformer.py", line 461, in forward
    dec_out, attn, attn_align = layer(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/decoders/transformer.py", line 103, in forward
    layer_out, attns = self._forward(*args, **kwargs)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/decoders/transformer.py", line 262, in _forward
    query, _ = self._forward_self_attn(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/decoders/transformer.py", line 149, in _forward_self_attn
    return self.self_attn(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/modules/multi_headed_attn.py", line 181, in forward
    key = torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 8 for tensor number 1 in the list.

Hello!

If the batch type is tokens, then the size should be higher than this. Try 1024 or 2048.

Kind regards,
Yasmin

Hi,
I tried with batch_size=1024 and batch_size=2048. Still the getting the (almost) same error. The last line has changed to:

RuntimeError: INDICES element is out of DATA bounds, id=4 axis_dim=1

The rest of the error message is exactly the same

This is the file that should be translated. What does it include? Can you share a few sentences from the top and bottom of the file?

This is the exact file.
Here are the first 3 lines:

ABER FREUEN
MORGEN SONNE
SAMSTAG WECHSELHAFT

and the last 3 lines from the file:

FREITAG SONNE WOLKE WECHSELHAFT SCHAUER KOENNEN IX ABER SONNE LANG
MORGEN WETTER WIE-AUSSEHEN ERSTE APRIL DONNERSTAG
FLUSS REGION SCHAUER NORDWEST WARM KOMMEN

These tokens are German sign glosses i.e., German words that represent what a person is signing (from a sign language video).

Maybe @guillaumekln has more insights.

Looks like there is a shape issue during beam search.

@vince62s @francoishernandez Any idea what could be the issue here?

try to remove the “tgt” line in your inference config file. (and remove replace_unk too)

@vince62s Thanks for the suggestion. It works when I remove both tgt and replace_unk lines.
However, I want to have the option of replace_unk: true in order to reproduce the results from a research paper. But, when I just remove tgt, and keep replace_unk, I get the following errors:

Running on CPU

Traceback (most recent call last):
  File "/data/envs/open_nmt/bin/onmt_translate", line 8, in <module>
    sys.exit(main())
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 60, in main
    translate(opt)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 41, in translate
    _, _ = translator._translate(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 345, in _translate
    batch_data = self.translate_batch(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 723, in translate_batch
    return self._translate_batch_with_strategy(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 813, in _translate_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/beam_search.py", line 295, in advance
    current_attn = attn.index_select(1, self.select_indices)
RuntimeError: INDICES element is out of DATA bounds, id=4 axis_dim=1

Running on GPU (gpu: 0 in config file):

../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [0,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/data/envs/open_nmt/bin/onmt_translate", line 8, in <module>
    sys.exit(main())
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 60, in main
    translate(opt)
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/bin/translate.py", line 41, in translate
    _, _ = translator._translate(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 345, in _translate
    batch_data = self.translate_batch(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 723, in translate_batch
    return self._translate_batch_with_strategy(
  File "/data/envs/open_nmt/lib/python3.8/site-packages/onmt/translate/translator.py", line 815, in _translate_batch_with_strategy
    if any_finished:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

did you train a transformer or a RNN ?

replace_unk does not work with transformers.

group.add('--replace_unk', '-replace_unk', action="store_true",
          help="Replace the generated UNK tokens with the "
               "source token that had highest attention weight. If "
               "phrase_table is provided, it will look up the "
               "identified source token and give the corresponding "
               "target token. If it is not provided (or the identified "
               "source token does not exist in the table), then it "
               "will copy the source token.")

With transformers we don’t have alignments (several heads).
We could try with the with_align options but it’s not implemented yet.

I am training a transformer. The issue you referenced above has conflicting views on whether replace_unk would work with a transformer model. @guillaumekln says that replace_unk does not apply to transformers. However, @tel34 says that he was able to handle unk’s using Transformer