IndexError or AssertionError while translating featured text

yunes · April 23, 2020, 4:00pm

Hi,
I used OpenNMT1.0.0rc2.
I set up a very small example with two featured texts (one feature on each word).
Preprocess and training are correct.
But translating is impossible.

If I use a tagged file to be translated then I got:
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/modules/util_class.py”, line 25, in forward
assert len(self) == len(inputs_)
AssertionError

If I use a non tagged file to be translated then I got:
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/inputters/text_dataset.py”, line 68, in
tokens = [t.split(feat_delim)[layer] for t in tokens]
IndexError: list index out of range

What’s wrong? I am unable to set up a translator using features…

Thanks, JB

francoishernandez · April 23, 2020, 8:07pm

Hey @yunes

If I use a non tagged file to be translated then I got:
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/inputters/text_dataset.py”, line 68, in
tokens = [t.split(feat_delim)[layer] for t in tokens]
IndexError: list index out of range

–> If your model was trained with features it’s looking for them but can’t find them.

If I use a tagged file to be translated then I got:
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/modules/util_class.py”, line 25, in forward
assert len(self) == len(inputs_)
AssertionError

–> This one is stranger. Can you give an example of your inputs? (training data + what you’re trying to translate)

Also if your example is a very small toy, it might be easier for everyone to just share your data & model so that we can have a look.

PS: when posting errors please post the full trace, it’s helpful to track the exact code path.

yunes · April 23, 2020, 10:48pm

Thanks for your reply. Please find enclosed more detailed information.

DATAS:
test-fr-pos.txt file:
Abbaye￨TOK de￨TOK Cleeve￨TOK
Le￨TOK dortoir￨TOK de￨TOK

test-en-pos.txt file:
Cleeve￨TOK Abbey￨TOK
The￨TOK Dormitory￨TOK at￨TOK

PREPROCESS made with:
onmt_preprocess -train_src test-fr-pos.txt -train_tgt test-en-pos.txt -valid_src test-fr-pos.txt -valid_tgt test-en-pos.txt -save_data preprocess

TRAINING made with:
onmt_train --batch_size=1 --train_steps 4 --save_checkpoint_steps 2 -world_size 1 -gpu_ranks 0 -save_model output/model -data output/preprocess -world_size 1 -gpu_ranks 0

TRANSLATION tried with:
onmt_translate -gpu 0 -replace_unk -model output/model_step_4.pt -src essai-pos.txt -output translation.txt

With essai-pos.txt file:
Abbaye￨TOK Abbaye￨TOK

and the traceback is:

[2020-04-24 00:48:39,429 INFO] Translating shard 0.
/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
var = torch.tensor(arr, dtype=self.dtype, device=device)
Traceback (most recent call last):
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/bin/onmt_translate”, line 8, in
sys.exit(main())
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/bin/translate.py”, line 49, in main
translate(opt)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/bin/translate.py”, line 33, in translate
attn_debug=opt.attn_debug
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/translate/translator.py”, line 351, in translate
batch, data.src_vocabs, attn_debug
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/translate/translator.py”, line 544, in translate_batch
return_attention=attn_debug or self.replace_unk)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/translate/translator.py”, line 693, in _translate_batch
batch_offset=beam._batch_offset)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/translate/translator.py”, line 582, in _decode_and_generate
decoder_in, memory_bank, memory_lengths=memory_lengths, step=step
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/decoders/decoder.py”, line 214, in forward
tgt, memory_bank, memory_lengths=memory_lengths)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/decoders/decoder.py”, line 381, in run_forward_pass
emb = self.embeddings(tgt)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/modules/embeddings.py”, line 277, in forward
source = self.make_embedding(source)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/torch/nn/modules/container.py”, line 100, in forward
input = module(input)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/home/yunes/anaconda3/envs/OpenNMT1.0.0rc2/lib/python3.7/site-packages/onmt/modules/util_class.py”, line 32, in forward
assert len(self) == len(inputs)
AssertionError

francoishernandez · April 24, 2020, 6:18pm

Thanks for the nice and easily reproducible example.
The issue here is that target features are not currently supported. Your training works rather as a side effect than on intention I think. (e.g. if you try the same with a Transformer model it will raise an error.)
I have a WIP implementation to support target features. It’s mostly working if you want to give it a try. IIRC I tested only on Transformer but RNN support should not be too difficult to add.

yunes · April 25, 2020, 12:43pm

Thanks for your reply.

What you mean by “with a Transformer model”, with OpenNMT-tf?
Whatever, thanks again!

Best regards.

francoishernandez · April 25, 2020, 1:08pm

Transformer is a seq2seq architecture proposed by Google in 2017. It is way better than RNNs for most seq2seq tasks.
See here for how to use it with OpenNMT-py.
(-tf in OpenNMT-tf stands for Tensorflow.)

yunes · April 25, 2020, 1:55pm

Thanks! I am very new in this field…