OpenNMT-tf : EOFError("No valid references for a sentence")

alexeir · July 8, 2021, 3:19pm

Yesterday we updated to the latest OpenNMT-tf version (2.20.1). The previous version we had, worked fine and wasn’t updated for about a year, so I don’t know exactly what changed in it. But today, we get an error at BLEU evaluation step. Training and validation data seems fine and worked at our old OpenNMT-tf version.

…

2021-07-08 14:35:37.423000: I runner.py:281] Step = 3500 ; steps/s = 2.20, source words/s = 49025, target words/s = 60058 ; Learning rate = 0.000612 ; Loss = 2.551785

2021-07-08 14:39:25.243000: I runner.py:281] Step = 4000 ; steps/s = 2.19, source words/s = 49013, target words/s = 60033 ; Learning rate = 0.000699 ; Loss = 2.458740

2021-07-08 14:43:13.988000: I runner.py:281] Step = 4500 ; steps/s = 2.19, source words/s = 48274, target words/s = 59129 ; Learning rate = 0.000786 ; Loss = 2.332188

2021-07-08 14:47:01.425000: I runner.py:281] Step = 5000 ; steps/s = 2.20, source words/s = 49088, target words/s = 60138 ; Learning rate = 0.000874 ; Loss = 2.241347

2021-07-08 14:47:01.642000: I training.py:186] Saved checkpoint 1.train_result/ckpt-5000

2021-07-08 14:47:01.643000: I training.py:202] Running evaluation for step 5000

2021-07-08 14:47:05.381000: W deprecation.py:534] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.

Instructions for updating:

The validate_indices argument has no effect. Indices are always validated on CPU and never validated on GPU.

2021-07-08 14:47:07.237100: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:07.237238: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:07.239548: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:07.242189: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:07.252516: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:07.254303: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:07.254996: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:07.258484: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

2021-07-08 14:47:39.548000: I training.py:202] Evaluation predictions saved to 1.train_result/eval/predictions.txt.5000

Traceback (most recent call last):

File “/usr/local/bin/onmt-main”, line 8, in

sys.exit(main())

File “/usr/local/lib/python3.6/dist-packages/opennmt/bin/main.py”, line 326, in main

hvd=hvd,

File “/usr/local/lib/python3.6/dist-packages/opennmt/runner.py”, line 281, in train

moving_average_decay=train_config.get(“moving_average_decay”),

File “/usr/local/lib/python3.6/dist-packages/opennmt/training.py”, line 145, in call

evaluator, step, moving_average=moving_average

File “/usr/local/lib/python3.6/dist-packages/opennmt/training.py”, line 202, in _evaluate

evaluator(step)

File “/usr/local/lib/python3.6/dist-packages/opennmt/evaluation.py”, line 343, in call

score = scorer(self._labels_file, output_path)

File “/usr/local/lib/python3.6/dist-packages/opennmt/utils/scorers.py”, line 92, in call

bleu = sacrebleu.corpus_bleu(sys_stream, [ref_stream], force=True)

File “/usr/local/lib/python3.6/dist-packages/sacrebleu/compat.py”, line 36, in corpus_bleu

sys_stream, ref_streams, use_effective_order=use_effective_order)

File “/usr/local/lib/python3.6/dist-packages/sacrebleu/metrics/bleu.py”, line 286, in corpus_score

raise EOFError(“No valid references for a sentence!”)

EOFError: No valid references for a sentence

guillaumekln · July 8, 2021, 3:25pm

Can you post the following information:

The version you upgraded from
The content of the YAML configuration file

guillaumekln · July 8, 2021, 4:01pm

Looking at the code of SacreBLEU, you probably have an empty line in your evaluation file.

sandeepch · September 15, 2021, 12:51pm

Hi, do you have an update on this issue?

Edit : I checked that my validation has no empty lines (except the last line)

guillaumekln · September 15, 2021, 1:20pm

Can you try with the latest version of SacreBLEU?

pip install --upgrade sacrebleu

sandeepch · September 16, 2021, 8:09am

Using <v2.0.0 at present. Will upgrade and try and post if it works

sandeepch · September 17, 2021, 8:51am

I tried with the latest version of sacrebleu but opennmt v2.20.x and it still gives the same error.

I do have a blank line at the end of the file but that’s a necessity for training tokenizer. Other than that i don’t see any empty lines so not sure what the issue is …

guillaumekln · September 17, 2021, 8:57am

Do you mean the file is ending with the newline character \n, or you actually have an empty line at the end of the file? The final newline character is fine, but the empty lines should be removed.

It’s possible SacreBLEU is also reporting this error if one prediction is empty. You can try using this parameter:

params:
  minimum_decoding_length: 1

sandeepch · September 17, 2021, 9:07am

Looks like they only do a check on the reference and not the predictions though sacrebleu/bleu.py at 5dfcaa3cee00039bcad7a12147b6d6976cb46f42 · mjpost/sacrebleu · GitHub

I am going to double check if its a new line character or an extra line getting added in the pipeline and get back to you.

sandeepch · September 17, 2021, 11:41am

@guillaumekln I seem to have found the issue.

The input file had line with only " . The quotes were then getting tokenized to an empty/blank line during tokenization step and causing sacrebleu to error out during evaluation

guillaumekln · September 17, 2021, 4:56pm

Are you referring to the SacreBLEU tokenization? Because on our side we are passing the reference file as is to SacreBLEU.

sandeepch · September 19, 2021, 9:01am

No I mean the sp tokenization in OpenNMT. The tokenized validation files had an empty line when the " was tokenized by sentencpiece. The SacreBLEU error happened during the evaluation step while training, so the reference file here is the tokenized ref file.

guillaumekln · September 20, 2021, 7:28am

I’m still not sure how this can happen.

If you used on-the-fly tokenization in OpenNMT, we pass the original non tokenized reference file to SacreBLEU.
If you tokenize the data yourself before the training, can you provide more information on how to make SentencePiece create an empty line from a quote character?