FallbackException: Inputs are not already EagerTensors

Gerd · February 24, 2020, 8:11pm

Greetings.

I’m running into problems using a multisource model with one source being a TFRecord (each line of text corresponds to a sequence of floating-point vectors) and the other source being text. I was able to do this prior to OpenNMT 2. The TFRecord files have been created with OpenNMT 2.

I am getting the error “tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.”

I’m using OpenNMT-tf_v1.21.0, python 3.6

Files are below. Thanks in advance for your help!

command line:

onmt-main
–config 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt240/config.yml
–model 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt240/model.py
–auto_config
train --with_eval
–num_gpus 2
&>>2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt240/log.out

yml:

model_dir: 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt240
data:
  train_features_file: [
    data/train-nocrawl.sp32k.ru,
    data/train-nocrawl.wv200b.tf2.ru.gz
    ]
  train_labels_file: data/train-nocrawl.sp32k.en
  eval_features_file: [
    data/valid.sp32k.ru,
    data/valid.wv200b.tf2.ru.gz,
    ]
  eval_labels_file: data/valid.sp32k.en
  source_1_vocabulary: data/wmt19-ruen-ru-32k.onmt.vocab
  target_vocabulary: data/wmt19-ruen-en-32k.onmt.vocab
train:
  save_checkpoints_steps: 10000
  batch_size: 256
eval:
  external_evaluators: BLEU

model_description.py:

import opennmt as onmt

def model():
  return onmt.models.Transformer(
      source_inputter=onmt.inputters.ParallelInputter(inputters=[
          onmt.inputters.WordEmbedder(embedding_size=512),
          onmt.inputters.SequenceRecordInputter(input_depth=390)
          ]
              ,reducer=None
              ,share_parameters=False
              ,combine_features=True
              ),
      target_inputter=onmt.inputters.WordEmbedder(embedding_size=512),
      num_layers=6,
      num_units=512,
      num_heads=8,
      ffn_inner_dim=1024,
      dropout=0.1,
      attention_dropout=0.1)

log:

OpenNMT-tf 2.4.0
INFO:tensorflow:Using parameters:
data:
eval_features_file:

data/valid.sp32k.ru

data/valid.wv200b.tf2.ru.gz
eval_labels_file: data/valid.sp32k.en
source_1_vocabulary: data/wmt19-ruen-ru-32k.onmt.vocab
target_vocabulary: data/wmt19-ruen-en-32k.onmt.vocab
train_features_file:

data/train-nocrawl.sp32k.ru

data/train-nocrawl.wv200b.tf2.ru.gz
train_labels_file: data/train-nocrawl.sp32k.en
eval:
batch_size: 32
external_evaluators: BLEU
infer:
batch_size: 32
length_bucket_width: 5
model_dir: 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt240
params:
average_loss_in_time: true
beam_width: 4
decay_params:
model_dim: 512
warmup_steps: 8000
decay_type: NoamDecay
label_smoothing: 0.1
learning_rate: 2.0
num_hypotheses: 1
optimizer: LazyAdam
optimizer_params:
beta_1: 0.9
beta_2: 0.998
score:
batch_size: 64
train:
average_last_checkpoints: 8
batch_size: 256
batch_type: tokens
effective_batch_size: 25000
keep_checkpoint_max: 8
length_bucket_width: 1
max_step: 500000
maximum_features_length: 100
maximum_labels_length: 100
sample_buffer_size: -1
save_checkpoints_steps: 10000
save_summary_steps: 100

2020-02-21 15:12:33.281049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-21 15:12:33.290734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:02:00.0
2020-02-21 15:12:33.291831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:81:00.0
2020-02-21 15:12:33.294807: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-21 15:12:33.299452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-02-21 15:12:33.303918: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-02-21 15:12:33.307457: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-02-21 15:12:33.312305: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-02-21 15:12:33.316914: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-02-21 15:12:33.324542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-21 15:12:33.328831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-02-21 15:12:33.329327: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-21 15:12:33.341950: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599895000 Hz
2020-02-21 15:12:33.343915: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x43b52a0 executing computations on platform Host. Devices:
2020-02-21 15:12:33.343939: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2020-02-21 15:12:33.514518: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44182b0 executing computations on platform CUDA. Devices:
2020-02-21 15:12:33.514573: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla M40, Compute Capability 5.2
2020-02-21 15:12:33.514583: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): Tesla M40, Compute Capability 5.2
2020-02-21 15:12:33.516786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:02:00.0
2020-02-21 15:12:33.518515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:81:00.0
2020-02-21 15:12:33.518593: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-21 15:12:33.518626: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-02-21 15:12:33.518651: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-02-21 15:12:33.518676: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-02-21 15:12:33.518700: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-02-21 15:12:33.518724: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-02-21 15:12:33.518753: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-21 15:12:33.525199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-02-21 15:12:33.525279: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-21 15:12:33.529175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-21 15:12:33.529209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1
2020-02-21 15:12:33.529224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N
2020-02-21 15:12:33.529235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N
2020-02-21 15:12:33.534037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10772 MB memory) -> physical GPU (device: 0, name: Tesla M40, pci bus id: 0000:02:00.0, compute capability: 5.2)
2020-02-21 15:12:33.536312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10772 MB memory) -> physical GPU (device: 1, name: Tesla M40, pci bus id: 0000:81:00.0, compute capability: 5.2)
WARNING:tensorflow:No checkpoint to restore in 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt240
INFO:tensorflow:Training on 17819 examples
WARNING:tensorflow:From /home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/summary/summary_iterator.py:68: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
tf.data.TFRecordDataset(path)
INFO:tensorflow:Accumulate gradients of 49 iterations to reach effective batch size of 25000
2020-02-21 15:12:42.078785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:02:00.0
2020-02-21 15:12:42.079455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:81:00.0
2020-02-21 15:12:42.079500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-21 15:12:42.079512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-02-21 15:12:42.079538: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-02-21 15:12:42.079556: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-02-21 15:12:42.079567: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-02-21 15:12:42.079587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-02-21 15:12:42.079608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-21 15:12:42.081475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-02-21 15:12:42.081535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-21 15:12:42.081544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1
2020-02-21 15:12:42.081550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N
2020-02-21 15:12:42.081554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N
2020-02-21 15:12:42.082881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 10772 MB memory) -> physical GPU (device: 0, name: Tesla M40, pci bus id: 0000:02:00.0, compute capability: 5.2)
2020-02-21 15:12:42.083547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:1 with 10772 MB memory) -> physical GPU (device: 1, name: Tesla M40, pci bus id: 0000:81:00.0, compute capability: 5.2)
2020-02-21 15:12:43.152207: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
Traceback (most recent call last):
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py”, line 8961, in shape
name, _ctx._post_execution_callbacks, input, “out_type”, out_type)
tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/gerd/bin/onmt240/bin/onmt-main”, line 13, in
load_entry_point(‘OpenNMT-tf’, ‘console_scripts’, ‘onmt-main’)()
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/bin/main.py”, line 189, in main
checkpoint_path=args.checkpoint_path)
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/runner.py”, line 198, in train
mixed_precision=self._mixed_precision)
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/training.py”, line 44, in init
self._model.create_variables()
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/models/model.py”, line 286, in create_variables
_ = self(features, labels=labels, training=True, step=0)
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/models/model.py”, line 95, in call
return super(Model, self).call(*args, **kwargs)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py”, line 891, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/models/sequence_to_sequence.py”, line 163, in call
training=training)
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/models/sequence_to_sequence.py”, line 197, in _decode_target
initial_state=encoder_state)
File “/home/gerd/OpenNMT-tf_v2.4.0/opennmt/decoders/decoder.py”, line 178, in initial_state
batch_size = tf.shape(sentinel)[0]
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py”, line 432, in shape_v2
return shape(input, name, out_type)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py”, line 458, in shape
return shape_internal(input, name, optimize=True, out_type=out_type)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py”, line 486, in shape_internal
return gen_array_ops.shape(input, name=name, out_type=out_type)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py”, line 8966, in shape
input, out_type=out_type, name=name, ctx=_ctx)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py”, line 9005, in shape_eager_fallback
_attr_T, (input,) = _execute.args_to_matching_eager([input], _ctx)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/eager/execute.py”, line 257, in args_to_matching_eager
t, dtype, preferred_dtype=default_dtype, ctx=ctx))
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py”, line 1296, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py”, line 286, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py”, line 227, in constant
allow_broadcast=True)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py”, line 235, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File “/home/gerd/bin/onmt240/lib64/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py”, line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (None) with an unsupported type (<class ‘NoneType’>) to a Tensor.

guillaumekln · February 25, 2020, 9:11am

Thanks for reporting. There is an issue when initializing the decoder state in the case of a multi source encoder.

Here’s the fix:

Gerd · February 25, 2020, 2:32pm

That fixed it. Thanks!

Gerd · February 25, 2020, 5:58pm

I spoke too soon. Training steps seem to work, but there’s now a “ValueError: None values not supported.” crash at eval and inference.

Here’s a log:

2020-02-25 12:09:14.047677: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64:/usr/local/cuda/lib64:/tools/SGE/lib/lx-amd64:/tools/openfst/lib
2020-02-25 12:09:14.047860: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer_plugin.so.6’; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64:/usr/local/cuda/lib64:/tools/SGE/lib/lx-amd64:/tools/openfst/lib
2020-02-25 12:09:14.047881: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
OpenNMT-tf 2.7.0
2020-02-25 12:09:18.545981: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64:/usr/local/cuda/lib64:/tools/SGE/lib/lx-amd64:/tools/openfst/lib
2020-02-25 12:09:18.546211: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer_plugin.so.6’; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64:/usr/local/cuda/lib64:/tools/SGE/lib/lx-amd64:/tools/openfst/lib
2020-02-25 12:09:18.546237: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO:tensorflow:Using parameters:
data:
eval_features_file:

data/valid.sp32k.ru

data/valid.wv200b.tf2.ru.gz
eval_labels_file: data/valid.sp32k.en
source_1_vocabulary: data/wmt19-ruen-ru-32k.onmt.vocab
target_vocabulary: data/wmt19-ruen-en-32k.onmt.vocab
train_features_file:

data/train-nocrawl.sp32k.ru

data/train-nocrawl.wv200b.tf2.ru.gz
train_labels_file: data/train-nocrawl.sp32k.en
eval:
batch_size: 32
external_evaluators: BLEU
steps: 200
infer:
batch_size: 32
length_bucket_width: 5
model_dir: 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt270B
params:
average_loss_in_time: true
beam_width: 4
decay_params:
model_dim: 512
warmup_steps: 8000
decay_type: NoamDecay
label_smoothing: 0.1
learning_rate: 2.0
num_hypotheses: 1
optimizer: LazyAdam
optimizer_params:
beta_1: 0.9
beta_2: 0.998
score:
batch_size: 64
train:
average_last_checkpoints: 8
batch_size: 256
batch_type: tokens
effective_batch_size: 25000
keep_checkpoint_max: 8
length_bucket_width: 1
max_step: 500000
maximum_features_length: 100
maximum_labels_length: 100
sample_buffer_size: -1
save_checkpoints_steps: 200
save_summary_steps: 100

2020-02-25 12:09:21.512390: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-25 12:09:21.528401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: Tesla M40 computeCapability: 5.2
coreClock: 1.112GHz coreCount: 24 deviceMemorySize: 11.18GiB deviceMemoryBandwidth: 268.58GiB/s
2020-02-25 12:09:21.530255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:81:00.0 name: Tesla M40 computeCapability: 5.2
coreClock: 1.112GHz coreCount: 24 deviceMemorySize: 11.18GiB deviceMemoryBandwidth: 268.58GiB/s
2020-02-25 12:09:21.530727: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-25 12:09:21.534907: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-25 12:09:21.538876: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-25 12:09:21.539480: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-25 12:09:21.543806: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-25 12:09:21.546132: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-25 12:09:21.555609: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-25 12:09:21.561911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
2020-02-25 12:09:21.562529: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 12:09:21.578060: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600035000 Hz
2020-02-25 12:09:21.581531: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x527f0f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 12:09:21.581565: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-02-25 12:09:21.771367: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x52e5680 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-25 12:09:21.771416: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla M40, Compute Capability 5.2
2020-02-25 12:09:21.771426: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla M40, Compute Capability 5.2
2020-02-25 12:09:21.773606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: Tesla M40 computeCapability: 5.2
coreClock: 1.112GHz coreCount: 24 deviceMemorySize: 11.18GiB deviceMemoryBandwidth: 268.58GiB/s
2020-02-25 12:09:21.775354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:81:00.0 name: Tesla M40 computeCapability: 5.2
coreClock: 1.112GHz coreCount: 24 deviceMemorySize: 11.18GiB deviceMemoryBandwidth: 268.58GiB/s
2020-02-25 12:09:21.775434: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-25 12:09:21.775472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-25 12:09:21.775502: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-25 12:09:21.775536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-25 12:09:21.775568: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-25 12:09:21.775600: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-25 12:09:21.775633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-25 12:09:21.782391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
2020-02-25 12:09:21.782470: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-25 12:09:21.786409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-25 12:09:21.786444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 1
2020-02-25 12:09:21.786465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N N
2020-02-25 12:09:21.786478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1: N N
2020-02-25 12:09:21.791461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10772 MB memory) -> physical GPU (device: 0, name: Tesla M40, pci bus id: 0000:02:00.0, compute capability: 5.2)
2020-02-25 12:09:21.793776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10772 MB memory) -> physical GPU (device: 1, name: Tesla M40, pci bus id: 0000:81:00.0, compute capability: 5.2)
WARNING:tensorflow:No checkpoint to restore in 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt270B
WARNING:tensorflow:From /home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/summary/summary_iterator.py:68: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
tf.data.TFRecordDataset(path)
INFO:tensorflow:Accumulate gradients of 49 iterations to reach effective batch size of 25000
INFO:tensorflow:Using MirroredStrategy with devices (’/job:localhost/replica:0/task:0/device:GPU:0’, ‘/job:localhost/replica:0/task:0/device:GPU:1’)
INFO:tensorflow:Training on 17819 examples
WARNING:tensorflow:From /home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to (’/job:localhost/replica:0/task:0/device:CPU:0’,).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to (’/job:localhost/replica:0/task:0/device:CPU:0’,).
2020-02-25 12:12:21.671012: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-25 12:12:25.306608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
INFO:tensorflow:batch_all_reduce: 418 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 418 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:Saved checkpoint 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt270B/ckpt-1
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to (’/job:localhost/replica:0/task:0/device:CPU:0’,).
INFO:tensorflow:Step = 100 ; steps/s = 0.14, target words/s = 2003 ; Learning rate = 0.000012 ; Loss = 9.768065
INFO:tensorflow:Step = 200 ; steps/s = 0.20, target words/s = 2971 ; Learning rate = 0.000025 ; Loss = 8.787529
INFO:tensorflow:Saved checkpoint 2src_sp32k_200_6L_fulltrain_nogrowth_tf2_onmt270B/ckpt-200
INFO:tensorflow:Running evaluation for step 200
2020-02-25 12:29:59.573035: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
2020-02-25 12:29:59.574394: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
Traceback (most recent call last):
File “/home/gerd/bin/onmt270/bin/onmt-main”, line 11, in
load_entry_point(‘OpenNMT-tf’, ‘console_scripts’, ‘onmt-main’)()
File “/home/gerd/OpenNMT-tf_v2.7.0/opennmt/bin/main.py”, line 204, in main
checkpoint_path=args.checkpoint_path)
File “/home/gerd/OpenNMT-tf_v2.7.0/opennmt/runner.py”, line 208, in train
moving_average_decay=train_config.get(“moving_average_decay”))
File “/home/gerd/OpenNMT-tf_v2.7.0/opennmt/training.py”, line 104, in call
early_stop = self._evaluate(evaluator, step, moving_average=moving_average)
File “/home/gerd/OpenNMT-tf_v2.7.0/opennmt/training.py”, line 182, in _evaluate
evaluator(step)
File “/home/gerd/OpenNMT-tf_v2.7.0/opennmt/evaluation.py”, line 268, in call
loss, predictions = self._eval_fn(source, target)
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/eager/def_function.py”, line 568, in call
result = self._call(*args, **kwds)
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/eager/def_function.py”, line 615, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/eager/def_function.py”, line 497, in _initialize
*args, **kwds))
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/eager/function.py”, line 2389, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/eager/function.py”, line 2703, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/eager/function.py”, line 2593, in _create_graph_function
capture_by_value=self._capture_by_value),
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py”, line 978, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/eager/def_function.py”, line 439, in wrapped_fn
return weak_wrapped_fn().wrapped(*args, **kwds)
File “/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py”, line 968, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in converted code:
/home/gerd/OpenNMT-tf_v2.7.0/opennmt/models/model.py:137 evaluate  *
    outputs, predictions = self(features, labels=labels)
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py:778 __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
/home/gerd/OpenNMT-tf_v2.7.0/opennmt/models/sequence_to_sequence.py:165 call  *
    predictions = self._dynamic_decode(
/home/gerd/OpenNMT-tf_v2.7.0/opennmt/models/sequence_to_sequence.py:231 _dynamic_decode  *
    encoder_state = tfa.seq2seq.tile_batch(encoder_state, beam_size)
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_addons/seq2seq/beam_search_decoder.py:124 tile_batch  *
    return tf.nest.map_structure(lambda t_: _tile_batch(t_, multiplier), t)
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/util/nest.py:568 map_structure
    structure[0], [func(*x) for x in entries],
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_addons/seq2seq/beam_search_decoder.py:82 _tile_batch  *
    t = tf.convert_to_tensor(t, name="t")
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1256 convert_to_tensor_v2
    as_ref=False)
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1314 convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py:317 _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py:258 constant
    allow_broadcast=True)
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py:296 _constant_impl
    allow_broadcast=allow_broadcast))
/home/gerd/bin/onmt270/lib64/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py:439 make_tensor_proto
    raise ValueError("None values not supported.")

ValueError: None values not supported.

guillaumekln · February 26, 2020, 8:56am

Thanks for the test. This should fix the inference part:

Let me know if you find other issues.

Gerd · February 26, 2020, 3:54pm

It seems to be working. Thanks!