I’ve been getting the following error when I try to get an output using the below files attached (data.yaml and custom_model.py), it has two inputs using parallelinputter, but I don’t understand what this issue is. Can you tell me if the data.yaml or custom_model.py is missing something currently?
2021-08-11 13:38:53.036000: I main.py:326] Using parameters:
data:
eval_features_file:
- qcr/src1_test.txt
- qcr/src2_test.txt
eval_labels_file: qcr/tgt_test.txt
source_1_vocabulary: qcr/src1.vocab
source_2_vocabulary: qcr/src2.vocab
target_vocabulary: qcr/tgt.vocab
train_features_file:
- qcr/src1_train.txt
- qcr/src2_train.txt
train_labels_file: qcr/tgt_train.txt
eval:
batch_size: 2
batch_type: examples
eval_delay: 3600
external_evaluators: BLEU
length_bucket_width: 5
infer:
batch_size: 2
batch_type: examples
length_bucket_width: 5
model_dir: qcr/run/
params:
average_loss_in_time: true
beam_width: 4
decay_params:
model_dim: 512
warmup_steps: 8000
decay_type: NoamDecay
label_smoothing: 0.1
learning_rate: 2.0
num_hypotheses: 1
optimizer: LazyAdam
optimizer_params:
beta_1: 0.9
beta_2: 0.998
score:
batch_size: 2
batch_type: examples
length_bucket_width: 5
train:
average_last_checkpoints: 8
batch_size: 2
batch_type: tokens
effective_batch_size: 25000
keep_checkpoint_max: 8
length_bucket_width: 1
max_step: 5000
maximum_features_length:
- 100
- 100
maximum_labels_length: 100
sample_buffer_size: -1
save_checkpoints_steps: 1000
save_summary_steps: 100
2021-08-11 13:38:53.241000: I inputter.py:309] Initialized source_1 input layer:
2021-08-11 13:38:53.241000: I inputter.py:309] - vocabulary size: 13001
2021-08-11 13:38:53.241000: I inputter.py:309] - special tokens: BOS=no, EOS=no
2021-08-11 13:38:53.251000: I inputter.py:309] Initialized source_2 input layer:
2021-08-11 13:38:53.251000: I inputter.py:309] - vocabulary size: 2499
2021-08-11 13:38:53.251000: I inputter.py:309] - special tokens: BOS=no, EOS=no
2021-08-11 13:38:53.299000: I inputter.py:309] Initialized target input layer:
2021-08-11 13:38:53.299000: I inputter.py:309] - vocabulary size: 13001
2021-08-11 13:38:53.299000: I inputter.py:309] - special tokens: BOS=yes, EOS=yes
2021-08-11 13:38:53.431000: W runner.py:242] No checkpoint to restore in qcr/run/
2021-08-11 13:38:53.433000: W deprecation.py:336] From /opt/conda/lib/python3.7/site-packages/tensorflow/python/summary/summary_iterator.py:31: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2021-08-11 13:38:54.989000: I main.py:326] Accumulate gradients of 12500 iterations to reach effective batch size of 25000
2021-08-11 13:38:55.032000: I dataset_ops.py:2120] Training on 912418 examples
2021-08-11 13:38:55.941997: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-11 13:38:55.945570: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2299995000 Hz
2021-08-11 13:39:12.030000: I control_flow.py:1225] Number of model parameters: 71712457
2021-08-11 13:39:12.877000: I control_flow.py:1225] Number of model weights: 321 (trainable = 321, non trainable = 0)
2021-08-11 13:39:43.039162: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-11 13:39:43.415667: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-08-11 13:39:43.416777: W ./tensorflow/stream_executor/stream.h:2140] attempting to perform DNN operation using StreamExecutor without DNN support
2021-08-11 13:39:43.419378: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-08-11 13:39:43.420335: W ./tensorflow/stream_executor/stream.h:2140] attempting to perform DNN operation using StreamExecutor without DNN support
2021-08-11 13:39:43.423007: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-08-11 13:39:43.423975: W ./tensorflow/stream_executor/stream.h:2140] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File "/opt/conda/bin/onmt-main", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/opennmt/bin/main.py", line 326, in main
hvd=hvd,
File "/opt/conda/lib/python3.7/site-packages/opennmt/runner.py", line 281, in train
moving_average_decay=train_config.get("moving_average_decay"),
File "/opt/conda/lib/python3.7/site-packages/opennmt/training.py", line 123, in __call__
dataset, accum_steps=accum_steps, report_steps=report_steps
File "/opt/conda/lib/python3.7/site-packages/opennmt/training.py", line 260, in _steps
loss = forward_fn()
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3024, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1961, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 596, in call
ctx=ctx)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cuDNN launch failure : input shape ([1,2,512,1])
[[node dual_source_transformer_1/parallel_encoder_1/self_attention_encoder_2/self_attention_encoder_layer_12/transformer_layer_wrapper_48/layer_norm_52/FusedBatchNormV3_1 (defined at /lib/python3.7/site-packages/opennmt/layers/common.py:128) ]]
[[Func/gradients/global_norm/write_summary/summary_cond/then/_328/input/_987/_62]]
(1) Internal: cuDNN launch failure : input shape ([1,2,512,1])
[[node dual_source_transformer_1/parallel_encoder_1/self_attention_encoder_2/self_attention_encoder_layer_12/transformer_layer_wrapper_48/layer_norm_52/FusedBatchNormV3_1 (defined at /lib/python3.7/site-packages/opennmt/layers/common.py:128) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__forward_48335]
Function call stack:
_forward -> _forward
data.yaml file:
model_dir: qcr/run/
data:
train_features_file:
- qcr/src1_train.txt
- qcr/src2_train.txt
train_labels_file: qcr/tgt_train.txt
eval_features_file:
- qcr/src1_test.txt
- qcr/src2_test.txt
eval_labels_file: qcr/tgt_test.txt
source_1_vocabulary: qcr/src1.vocab
source_2_vocabulary: qcr/src2.vocab
target_vocabulary: qcr/tgt.vocab
train:
batch_size: 2
save_checkpoints_steps: 1000
max_step: 5000
eval:
eval_delay: 3600 # Every 1 hour
batch_size: 2
external_evaluators: BLEU
score:
batch_size: 2
infer:
batch_size: 2
custom_model.py
import opennmt
from opennmt.utils import misc
import argparse
import logging
import tensorflow as tf
import tensorflow_addons as tfa
tf.get_logger().setLevel(logging.INFO)
class DualSourceTransformer(opennmt.models.Transformer):
def __init__(self):
super().__init__(
source_inputter=opennmt.inputters.ParallelInputter(
[
opennmt.inputters.WordEmbedder(embedding_size=512),
opennmt.inputters.WordEmbedder(embedding_size=512),
]
),
target_inputter=opennmt.inputters.WordEmbedder(embedding_size=512),
num_layers=6,
num_units=512,
num_heads=8,
ffn_inner_dim=2048,
dropout=0.1,
attention_dropout=0.1,
ffn_dropout=0.1,
share_encoders=True,
)
def auto_config(self, num_replicas=1):
config = super().auto_config(num_replicas=num_replicas)
max_length = config["train"]["maximum_features_length"]
return misc.merge_dict(
config, {"train": {"maximum_features_length": [max_length, max_length]}}
)
model = DualSourceTransformer