AttributeError: 'SequenceRecordInputter' object has no attribute 'input_depth'


I’m running into problems using a multisource model with one source being a TFRecord (each line of text corresponds to a sequence of floating-point vectors) and the other source being text.

I’m using OpenNMT-tf_v1.21.0, python 3.4, TensorFlow 1.13.1

My current problem is shown in the log at the bottom of the post: “AttributeError: ‘SequenceRecordInputter’ object has no attribute ‘input_depth’”. Any help would be appreciated.

Lesser problems (not of immediate concern, but might be related to the above):

  1. A compressed (GZIP or ZLIB) TFRecord file does not seem to load properly (“tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0”). The vectors are sparse, so compression would help a lot.
  2. I had to set source embedding_size equal to the depth of the TFRecord’s vectors to get it to work. I was hoping for a larger embedding size than the vector’s 400-dimensional representation.

Thanks in advance!

Here are some I/O. Happy to provide more info or try some experiments on this end.


onmt-main train_and_eval \
  --config config/enmt_wordshape_transformer.yml \
  --model config/model/ \
  --auto_config \
  --gpu_allow_growth \
  --num_gpus 2


model_dir: srctgt_wordshape_transformer

  train_features_file: [data/train.wv200.src, data/train.sp32k.src]
  train_labels_file: data/train.sp32k.tgt
  eval_features_file: [data/valid.wv200.src, data/valid.sp32k.src]
  eval_labels_file: data/valid.sp32k.tgt
  source_words_vocabulary: data/enmt-src-32k.onmt.vocab
  target_words_vocabulary: data/enmt-tgt-32k.onmt.vocab

  save_checkpoints_steps: 1000
  exporters: last

  eval_delay: 3600  # Every 1 hour
  external_evaluators: BLEU

  batch_size: 32


import opennmt as onmt

def model():
  return onmt.models.Transformer(


WARNING:tensorflow:You provided a model configuration but a checkpoint already exists. The model configuration must define the same model as the one used for the initial training. However, you can change non structural values like dropout.
INFO:tensorflow:Using parameters:
  - data/valid.wv200.src
  - data/valid.sp32k.src
  eval_labels_file: data/valid.sp32k.tgt
  source_words_vocabulary: data/srctgt-src-32k.onmt.vocab
  target_words_vocabulary: data/srctgt-tgt-32k.onmt.vocab
  - data/train.wv200.src
  - data/train.sp32k.src
  train_labels_file: data/train.sp32k.tgt
  batch_size: 32
  eval_delay: 3600
  exporters: last
  external_evaluators: BLEU
  batch_size: 32
  bucket_width: 5
model_dir: srctgt_wordshape_transformer
  average_loss_in_time: true
  beam_width: 4
    model_dim: 512
    warmup_steps: 8000
  decay_type: noam_decay_v2
  label_smoothing: 0.1
  learning_rate: 2.0
  length_penalty: 0.6
  optimizer: LazyAdamOptimizer
    beta1: 0.9
    beta2: 0.998
  batch_size: 64
  average_last_checkpoints: 8
  batch_size: 3072
  batch_type: tokens
  bucket_width: 1
  effective_batch_size: 25000
  exporters: last
  keep_checkpoint_max: 8
  maximum_features_length: 100
  maximum_labels_length: 100
  sample_buffer_size: -1
  save_checkpoints_steps: 1000
  save_summary_steps: 100
  train_steps: 500000

INFO:tensorflow:Accumulate gradients of 5 iterations to reach effective batch size of 25000
INFO:tensorflow:loss = 8.203028, step = 0
INFO:tensorflow:loss = 7.2150645, step = 100 (149.425 sec)
INFO:tensorflow:loss = 6.8113074, step = 200 (143.945 sec)
INFO:tensorflow:loss = 6.4717984, step = 300 (143.768 sec)
INFO:tensorflow:loss = 6.194136, step = 400 (143.209 sec)
INFO:tensorflow:loss = 5.8775764, step = 500 (143.132 sec)
INFO:tensorflow:loss = 5.720784, step = 600 (145.018 sec)
INFO:tensorflow:loss = 5.582198, step = 700 (141.451 sec)
INFO:tensorflow:loss = 5.302073, step = 800 (143.275 sec)
INFO:tensorflow:loss = 5.1950145, step = 900 (142.375 sec)
INFO:tensorflow:Saving checkpoints for 1000 into srctgt_wordshape_transformer/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
This regression was fixed in v1.21.4. The error occurs when trying to export the model.

The dataset instance is currently constructed with the default arguments (i.e. no compression). I will look if compression options can be cleanly exposed.

Mmh, I think this should work for multi source training. What was the error?


Updating to 1.21.6 fixed my input_depth problem. I could not replicate the embedding_size problem, so that must have been a figment of my imagination.

I would be happy to have compressed TFRecords supported. I tried hard-coding a hack to enable this (return,compression_type=“GZIP”)) in, but it did not work for me.

Thanks again,

That should be the way to do it. Just to make sure, to generate the compressed record file you configured the options argument of the TFRecordWriter, right?

That sounds like what I tried:

import tensorflow as tf
import opennmt as onmt
import numpy as np


while line:

  # Define vectorList, a list of vectors (numpy arrays) for the line



The file is definitely compressed (100x compression). I’ll try the hard-coded fix again when I have fewer systems in progress. I don’t think I tried it after the latest update.


Reading compressed TFRecords now works for me, with the below changes in record_inputter. Compression options are not cleanly exposed, so this “solution” is bad for people who prefer not to compress with gzip.

def make_dataset(self, data_file, training=None):
    first_record = next(compat.tf_compat(v1="python_io.tf_record_iterator")(data_file,options))
    first_record = tf.train.Example.FromString(first_record)
    shape = first_record.features.feature["shape"].int64_list.value
    self.input_depth = shape[-1]

  def get_dataset_size(self, data_file):
    return sum(1 for _ in compat.tf_compat(v1="python_io.tf_record_iterator")(data_file,options))
1 Like