ValueError: Tensor conversion requested dtype string for Tensor with dtype float32

iriniz · October 9, 2020, 9:57am

Hello,

I am trying to train a model with vocab size 50000 and with 403.099 segments but right after the first checkpoint I get the following error:

Invalid argument: ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>

BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node _inference_Dataset_map_136}} ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>

Below you can find my data.yml config file:

model_dir: run/
data:
train_features_file: src-train.txt
train_labels_file: tgt-train.txt
eval_features_file: src-val.txt
eval_labels_file: tgt-val.txt
source_vocabulary: src-sp-vocab
target_vocabulary: tgt-sp-vocab
source_tokenization: tok.yml
target_tokenization: tok.yml
params:
replace_unknown_target: true
dropout: 0.2
train:
average_last_checkpoints: 6
save_checkpoints_steps: 1000
keep_checkpoint_max: 6
max_step: 20000
batch_size: 1024
eval:
steps: 1000
save_eval_predictions: true
external_evaluators: [bleu]
early_stopping:
metric: bleu
min_improvement: 0.2
steps: 4
export_on_best: bleu
infer:
batch_size: 32

Below you can find the training command:

CUDA_VISIBLE_DEVICES=0,1 onmt-main --model_type Transformer --config $3/$1_$2/data.yml --auto_config --mixed_precision train --num_gpus 2 --with_eval

***Please note:
-I may use two Cuda devices, but I also get the same error when I use only one.
-I also get the same error when I use a different batch size.
-I also get the same error when I reduce the number of the sentences for training, val & test.
-I have not encountered the same error again during training of other models.

Could someone help me, please?

Thank you a lot in advance.

guillaumekln · October 9, 2020, 10:02am

Hi,

Could you post the full training logs if possible?

iriniz · October 9, 2020, 10:13am

INFO:tensorflow:Using parameters:
data:
  eval_features_file: <userpath>/src-val.txt
  eval_labels_file: <userpath>/tgt-val.txt
  source_tokenization: <userpath>/tok.yml
  source_vocabulary: <userpath>/src-sp-vocab
  target_tokenization: <userpath>/tok.yml
  target_vocabulary: <userpath>/tgt-sp-vocab
  train_features_file: <userpath>/src-train.txt
  train_labels_file: <userpath>/tgt-train.txt
eval:
  batch_size: 32
  early_stopping:
    metric: bleu
    min_improvement: 0.2
    steps: 4
  export_on_best: bleu
  external_evaluators:
  - bleu
  save_eval_predictions: true
  steps: 1000
infer:
  batch_size: 32
  length_bucket_width: 5
model_dir: <userpath>/run/
params:
  average_loss_in_time: true
  beam_width: 4
  decay_params:
    model_dim: 512
    warmup_steps: 8000
  decay_type: NoamDecay
  dropout: 0.2
  label_smoothing: 0.1
  learning_rate: 2.0
  num_hypotheses: 1
  optimizer: LazyAdam
  optimizer_params:
    beta_1: 0.9
    beta_2: 0.998
  replace_unknown_target: true
score:
  batch_size: 64
train:
  average_last_checkpoints: 6
  batch_size: 1024
  batch_type: tokens
  effective_batch_size: 25000
  keep_checkpoint_max: 6
  length_bucket_width: 1
  max_step: 20000
  maximum_features_length: 100
  maximum_labels_length: 100
  sample_buffer_size: -1
  save_checkpoints_steps: 1000
  save_summary_steps: 100

2020-10-09 10:24:24.876549: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-10-09 10:24:24.925241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-10-09 10:24:24.925741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:02:00.0
2020-10-09 10:24:24.925881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-09 10:24:24.926741: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-09 10:24:24.927443: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-10-09 10:24:24.927613: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-10-09 10:24:24.928650: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-10-09 10:24:24.929485: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-10-09 10:24:24.932103: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-09 10:24:24.934085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-10-09 10:24:24.934363: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-10-09 10:24:24.938918: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-10-09 10:24:24.939174: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c2190 executing computations on platform Host. Devices:
2020-10-09 10:24:24.939187: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2020-10-09 10:24:25.114879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555db10 executing computations on platform CUDA. Devices:
2020-10-09 10:24:25.114901: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-10-09 10:24:25.114906: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-10-09 10:24:25.115565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-10-09 10:24:25.116028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:02:00.0
2020-10-09 10:24:25.116054: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-09 10:24:25.116064: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-09 10:24:25.116073: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-10-09 10:24:25.116081: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-10-09 10:24:25.116089: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-10-09 10:24:25.116097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-10-09 10:24:25.116105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-09 10:24:25.117873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-10-09 10:24:25.117900: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-09 10:24:25.119042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-09 10:24:25.119052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 
2020-10-09 10:24:25.119056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N N 
2020-10-09 10:24:25.119060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   N N 
2020-10-09 10:24:25.120431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10213 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-10-09 10:24:25.121392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10319 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
WARNING:tensorflow:No checkpoint to restore in <userpath>/run/
INFO:tensorflow:Training on 396099 examples
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/summary/summary_iterator.py:68: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
INFO:tensorflow:Accumulate gradients of 13 iterations to reach effective batch size of 25000
2020-10-09 10:24:29.075951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-10-09 10:24:29.076425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:02:00.0
2020-10-09 10:24:29.076449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-09 10:24:29.076458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-09 10:24:29.076465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-10-09 10:24:29.076472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-10-09 10:24:29.076479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-10-09 10:24:29.076486: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-10-09 10:24:29.076493: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-09 10:24:29.077934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-10-09 10:24:29.077959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-09 10:24:29.077965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 
2020-10-09 10:24:29.077969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N N 
2020-10-09 10:24:29.077973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   N N 
2020-10-09 10:24:29.079071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 10213 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-10-09 10:24:29.079524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:1 with 10319 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
2020-10-09 10:24:29.571429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:253: _EagerTensorBase.cpu (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.identity instead.
INFO:tensorflow:Saved checkpoint <userpath>/run/ckpt-0
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1781: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-10-09 10:25:53.374810: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1740] Converted 2544/41281 nodes to float16 precision using 636 cast(s) to float16 (excluding Const and Variable casts)
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:batch_all_reduce: 260 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
2020-10-09 10:26:43.675084: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2020-10-09 10:26:55.935770: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1740] Converted 1272/16989 nodes to float16 precision using 318 cast(s) to float16 (excluding Const and Variable casts)
2020-10-09 10:27:19.545273: W tensorflow/core/framework/op_kernel.cc:1610] Invalid argument: ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 219, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 125, in __call__
    self._convert(ret, dtype=self._out_dtypes[0]))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 100, in _convert
    return ops.convert_to_tensor(value, dtype=dtype)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1271, in internal_convert_to_tensor
    (dtype.name, value.dtype.name, value))

ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>


2020-10-09 10:27:19.746781: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_<lambda>_136}} ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 219, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 125, in __call__
    self._convert(ret, dtype=self._out_dtypes[0]))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 100, in _convert
    return ops.convert_to_tensor(value, dtype=dtype)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1271, in internal_convert_to_tensor
    (dtype.name, value.dtype.name, value))

ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>


	 [[{{node EagerPyFunc_1}}]]
	 [[MultiDeviceIteratorGetNextFromShard]]
	 [[RemoteCall]]
	 [[IteratorGetNextAsOptional]]
	 [[Func/cond_4/then/_70/gradients/global_norm/write_summary/summary_cond/then/_3368/input/_3389/_914]]
2020-10-09 10:27:19.746908: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_<lambda>_136}} ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 219, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 125, in __call__
    self._convert(ret, dtype=self._out_dtypes[0]))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 100, in _convert
    return ops.convert_to_tensor(value, dtype=dtype)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1271, in internal_convert_to_tensor
    (dtype.name, value.dtype.name, value))

ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>


	 [[{{node EagerPyFunc_1}}]]
	 [[MultiDeviceIteratorGetNextFromShard]]
	 [[RemoteCall]]
	 [[IteratorGetNextAsOptional]]
2020-10-09 10:27:19.747031: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_<lambda>_136}} ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 219, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 125, in __call__
    self._convert(ret, dtype=self._out_dtypes[0]))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 100, in _convert
    return ops.convert_to_tensor(value, dtype=dtype)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1271, in internal_convert_to_tensor
    (dtype.name, value.dtype.name, value))

ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>


	 [[{{node EagerPyFunc_1}}]]
	 [[MultiDeviceIteratorGetNextFromShard]]
	 [[RemoteCall]]
	 [[IteratorGetNextAsOptional]]
	 [[cond_3/switch_pred/_51/_24]]
Traceback (most recent call last):
  File "/usr/local/bin/onmt-main", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/opennmt/bin/main.py", line 189, in main
    checkpoint_path=args.checkpoint_path)
  File "/usr/local/lib/python3.6/dist-packages/opennmt/runner.py", line 196, in train
    export_on_best=eval_config.get("export_on_best"))
  File "/usr/local/lib/python3.6/dist-packages/opennmt/training.py", line 175, in __call__
    for i, (loss, num_words, skipped) in enumerate(_forward()):  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.6/dist-packages/opennmt/data/dataset.py", line 433, in _fun
    outputs = _tf_fun()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 487, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 511, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  {{function_node __inference_Dataset_map_<lambda>_136}} ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 219, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 125, in __call__
    self._convert(ret, dtype=self._out_dtypes[0]))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 100, in _convert
    return ops.convert_to_tensor(value, dtype=dtype)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1271, in internal_convert_to_tensor
    (dtype.name, value.dtype.name, value))

ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>


	 [[{{node EagerPyFunc_1}}]]
	 [[MultiDeviceIteratorGetNextFromShard]]
	 [[RemoteCall]]
	 [[IteratorGetNextAsOptional]]
	 [[Func/cond_4/then/_70/gradients/global_norm/write_summary/summary_cond/then/_3368/input/_3389/_914]]
  (1) Invalid argument:  {{function_node __inference_Dataset_map_<lambda>_136}} ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 219, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 125, in __call__
    self._convert(ret, dtype=self._out_dtypes[0]))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 100, in _convert
    return ops.convert_to_tensor(value, dtype=dtype)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1271, in internal_convert_to_tensor
    (dtype.name, value.dtype.name, value))

ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor: id=219983, shape=(0,), dtype=float32, numpy=array([], dtype=float32)>


	 [[{{node EagerPyFunc_1}}]]
	 [[MultiDeviceIteratorGetNextFromShard]]
	 [[RemoteCall]]
	 [[IteratorGetNextAsOptional]]
0 successful operations.
1 derived errors ignored. [Op:__inference__tf_fun_101307]

Function call stack:
_tf_fun -> _tf_fun

guillaumekln · October 9, 2020, 10:37am

Just to make sure, do you still get the same error with the latest version of OpenNMT-tf?

pip install --upgrade OpenNMT-tf

iriniz · October 9, 2020, 11:16am

Thank you for your prompt response, Guillaume.
We are going to upgrade to the latest version of the OpenNMT-tf and we will get back to you, in case the error is not fixed.

iriniz · October 14, 2020, 2:36pm

We upgraded to the latest version of OpenNMT-tf and the error seems to have been fixed now.

Do you possibly know what the statement “I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do” indicates?

guillaumekln · October 14, 2020, 2:42pm

It can usually be ignored. When using mixed precision, not all TensorFlow graphs can be converted to float16.