Yesterday we updated to the latest OpenNMT-tf version (2.20.1). The previous version we had, worked fine and wasn’t updated for about a year, so I don’t know exactly what changed in it. But today, we get an error at BLEU evaluation step. Training and validation data seems fine and worked at our old OpenNMT-tf version.
…
2021-07-08 14:35:37.423000: I runner.py:281] Step = 3500 ; steps/s = 2.20, source words/s = 49025, target words/s = 60058 ; Learning rate = 0.000612 ; Loss = 2.551785
2021-07-08 14:39:25.243000: I runner.py:281] Step = 4000 ; steps/s = 2.19, source words/s = 49013, target words/s = 60033 ; Learning rate = 0.000699 ; Loss = 2.458740
2021-07-08 14:43:13.988000: I runner.py:281] Step = 4500 ; steps/s = 2.19, source words/s = 48274, target words/s = 59129 ; Learning rate = 0.000786 ; Loss = 2.332188
2021-07-08 14:47:01.425000: I runner.py:281] Step = 5000 ; steps/s = 2.20, source words/s = 49088, target words/s = 60138 ; Learning rate = 0.000874 ; Loss = 2.241347
2021-07-08 14:47:01.642000: I training.py:186] Saved checkpoint 1.train_result/ckpt-5000
2021-07-08 14:47:01.643000: I training.py:202] Running evaluation for step 5000
2021-07-08 14:47:05.381000: W deprecation.py:534] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The validate_indices
argument has no effect. Indices are always validated on CPU and never validated on GPU.
2021-07-08 14:47:07.237100: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:07.237238: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:07.239548: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:07.242189: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:07.252516: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:07.254303: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:07.254996: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:07.258484: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: “Softmax” attr { key: “T” value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: “GPU” vendor: “NVIDIA” model: “NVIDIA Tesla T4” frequency: 1590 num_cores: 40 environment { key: “architecture” value: “7.5” } environment { key: “cuda” value: “11020” } environment { key: “cudnn” value: “8100” } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14476378112 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2021-07-08 14:47:39.548000: I training.py:202] Evaluation predictions saved to 1.train_result/eval/predictions.txt.5000
Traceback (most recent call last):
File “/usr/local/bin/onmt-main”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/opennmt/bin/main.py”, line 326, in main
hvd=hvd,
File “/usr/local/lib/python3.6/dist-packages/opennmt/runner.py”, line 281, in train
moving_average_decay=train_config.get(“moving_average_decay”),
File “/usr/local/lib/python3.6/dist-packages/opennmt/training.py”, line 145, in call
evaluator, step, moving_average=moving_average
File “/usr/local/lib/python3.6/dist-packages/opennmt/training.py”, line 202, in _evaluate
evaluator(step)
File “/usr/local/lib/python3.6/dist-packages/opennmt/evaluation.py”, line 343, in call
score = scorer(self._labels_file, output_path)
File “/usr/local/lib/python3.6/dist-packages/opennmt/utils/scorers.py”, line 92, in call
bleu = sacrebleu.corpus_bleu(sys_stream, [ref_stream], force=True)
File “/usr/local/lib/python3.6/dist-packages/sacrebleu/compat.py”, line 36, in corpus_bleu
sys_stream, ref_streams, use_effective_order=use_effective_order)
File “/usr/local/lib/python3.6/dist-packages/sacrebleu/metrics/bleu.py”, line 286, in corpus_score
raise EOFError(“No valid references for a sentence!”)
EOFError: No valid references for a sentence