ValueError: Tensor’s shape is not compatible with supplied shape

Hello Fellow researchers,
Greetings ,

I tried to run multifeature with downsizing the embedding from 512 to 128 but I got an error stated below:

Traceback (most recent call last):
File “/opt/conda/envs/tf_v2_env/bin/onmt-main”, line 8, in
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/bin/”, line 189, in main
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/”, line 198, in train
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/”, line 44, in init
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/”, line 286, in create_variables
_ = self(features, labels=labels, training=True, step=0)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/”, line 95, in call
return super(Model, self).call(*args, **kwargs)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/”, line 887, in call
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/”, line 2141, in _maybe_build
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/”, line 141, in build
super(SequenceToSequence, self).build(input_shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/”, line 89, in build
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/inputters/”, line 342, in build
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/inputters/”, line 342, in build
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/inputters/”, line 416, in build
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/”, line 522, in add_weight
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/”, line 725, in _add_variable_with_custom_getter
name=name, shape=shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/”, line 792, in _preload_simple_restoration
checkpoint_position=checkpoint_position, shape=shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/”, line 75, in init
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/framework/”, line 1074, in set_shape
(self.shape, shape))
ValueError: Tensor’s shape (94949, 512) is not compatible with supplied shape [94949, 128]

my model for multi_features_transformer’s function is (Indents are proper):

def model():
return onmt.models.Transformer(

Please help me resolve this issue. what is going wrong when I change the embedding size from 512 to 128


This is my yml file.

model_dir: folder/run/

    - folder/f1
    - folder/f2
    - folder/f2
  train_labels_file: folder/t1
  source_1_vocabulary: folder/f1_vocab
  source_2_vocabulary: folder/f2_vocab
  source_3_vocabulary: folder/f3_vocab
  target_vocabulary: folder/t1_vocab

  # (required for train_end_eval and eval run types).
    - folder/f1_val
    - folder/f2_val
    - folder/f3_val
  eval_labels_file: folder/t1_val

  # (optional) For language models, configure sequence control tokens (usually
  # represented as <s> and </s>). For example, enabling "start" and disabling "end"
  # allows nonconditional and unbounded generation (default: start=false, end=true).
  # Advanced users could also configure this parameter for seq2seq models with e.g.
  # source_sequence_controls and target_sequence_controls.
    start: true
    end: true

# Model and optimization parameters.
  # The optimizer class name in tf.keras.optimizers or tfa.optimizers.
  optimizer: Adam
  # (optional) Additional optimizer parameters as defined in their documentation.
  # If weight_decay is set, the optimizer will be extended with decoupled weight decay.
    beta_1: 0.8
    beta_2: 0.998
  learning_rate: 1.0

  # (optional) If set, overrides all dropout values configured in the model definition.
  dropout: 0.1

  # (optional) List of layer to not optimize.
    - "encoder/layers/0"
    - "decoder/output_layer"

  # (optional) Weights regularization penalty (default: null).
    type: l2  # can be "l1", "l2", "l1_l2" (case-insensitive).
    scale: 1e-4  # if using "l1_l2" regularization, this should be a YAML list.

  # (optional) Average loss in the time dimension in addition to the batch dimension
  # (default: true when using "tokens" batch type, false otherwise).
  average_loss_in_time: false

  # (optional) The type of learning rate decay (default: null). See:
  #  *
  #  * opennmt/schedules/
  # This value may change the semantics of other decay options. See the documentation or the code.
  decay_type: NoamDecay
  # (optional unless decay_type is set) Decay parameters.
    model_dim: 128
    warmup_steps: 100

  # (optional) The learning rate minimum value (default: 0).
  minimum_learning_rate: 0.00001

  label_smoothing: 0.1

  # (optional) Width of the beam search (default: 1).
  beam_width: 5
  # (optional) Number of hypotheses to return (default: 1). Set 0 to return all
  # available hypotheses. This value is also set by infer/n_best.
  num_hypotheses: 1
  # (optional) Length penaly weight to use during beam search (default: 0).
  length_penalty: 0.2
  # (optional) Coverage penaly weight to use during beam search (default: 0).
  coverage_penalty: 0.2
  # (optional) Sample predictions from the top K most likely tokens (requires
  # beam_width to 1). If 0, sample from the full output distribution (default: 1).
  sampling_topk: 1
  # (optional) High temperatures generate more random samples (default: 1).
  sampling_temperature: 1
  # (optional) Sequence of noise to apply to the decoding output. Each element
  # should be a noise type (can be: "dropout", "replacement", "permutation") and
  # the module arguments
  # (see
    - dropout: 0.1
    - replacement: [0.1, ⦅unk⦆]
    - permutation: 3
  # (optional) Define the subword marker. This is useful to apply noise at the
  # word level instead of the subword level (default: ■).

  # (optional) Replace unknown target tokens by the original source token with the
  # highest attention (default: false).
  replace_unknown_target: true

  # (optional) The type of guided alignment cost to compute (can be: "null", "ce", "mse",
  # default: "null").
  guided_alignment_type: mse
  # (optional) The weight of the guided alignment cost (default: 1).
  guided_alignment_weight: 1

  # (optional) Enable contrastive learning mode, see
  # (default: false).
  # See also "decoding_subword_token" that is used by this mode.
  contrastive_learning: false
  # (optional) The value of the parameter eta in the max-margin loss (default: 0.1).
  max_margin_eta: 0.1

# Training options.
  # (optional when batch_type=tokens) If not set, the training will search the largest
  # possible batch size.
  batch_size: 64
  # (optional) Batch size is the number of "examples" or "tokens" (default: "examples").
  batch_type: tokens
  # (optional) Tune gradient accumulation to train with at least this effective batch size
  # (default: null).
  effective_batch_size: 25000

  # (optional) Save a checkpoint every this many steps (default: 5000).
  save_checkpoints_steps: 10000
  # (optional) How many checkpoints to keep on disk.
  keep_checkpoint_max: 3

  # (optional) Dump summaries and logs every this many steps (default: 100).
  save_summary_steps: 100

  # (optional) Maximum training step. If not set, train forever.
  max_step: 1000000
  # (optional) If true, makes a single pass over the training data (default: false).
  single_pass: false

  # (optional) The maximum length of feature sequences during training (default: null).
  maximum_features_length: 70
  # (optional) The maximum length of label sequences during training (default: null).
  maximum_labels_length: 70

  # (optional) The width of the length buckets to select batch candidates from.
  # A smaller value means less padding and increased efficiency. (default: 1).
  length_bucket_width: 1

  # (optional) The number of elements from which to sample during shuffling (default: 500000).
  # Set 0 or null to disable shuffling, -1 to match the number of training examples.
  sample_buffer_size: 500000

  # (optional) Number of checkpoints to average at the end of the training to the directory
  # model_dir/avg (default: 0).
  average_last_checkpoints: 8

# (optional) Evaluation options.
  # (optional) The batch size to use (default: 32).
  batch_size: 30

  # (optional) Evaluate every this many steps (default: 5000).
  steps: 5000

  # (optional) Save evaluation predictions in model_dir/eval/.
  save_eval_predictions: false
  # (optional) Evalutator or list of evaluators that are called on the saved evaluation predictions.
  # Available evaluators: bleu, rouge
  external_evaluators: bleu

  # (optional) Export a SavedModel when a metric has the best value so far (default: null).
  export_on_best: bleu

  # (optional) Early stopping condition.
  # Should be read as: stop the training if "metric" did not improve more
  # than "min_improvement" in the last "steps" evaluations.
    # (optional) The target metric name (default: "loss").
    metric: bleu
    # (optional) The metric should improve at least by this much to be considered as an improvement (default: 0)
    min_improvement: 0.01
    steps: 4

# (optional) Inference options.
  # (optional) The batch size to use (default: 1).
  batch_size: 10

  # (optional) For compatible models, the number of hypotheses to output (default: 1).
  # This sets the parameter params/num_hypotheses.
  n_best: 1
  # (optional) For compatible models, also output the score (default: false).
  with_scores: false
  # (optional) For compatible models, also output the alignments (can be: null, hard, soft,
  # default: null).
  with_alignments: null

  # (optional) The width of the length buckets to select batch candidates from.
  # If set, the test data will be sorted by length to increase the translation
  # efficiency. The predictions will still be outputted in order as they are
  # available (default: 0).
  length_bucket_width: 5

# (optional) Scoring options.
  # (optional) The batch size to use (default: 64).
  batch_size: 64
  # (optional) Also report token-level cross entropy.
  with_token_level: false
  # (optional) Also output the alignments (can be: null, hard, soft, default: null).
  with_alignments: null

Most likely you have a checkpoint in folder/run with the previous size of 512. You need to remove it or change model_dir.

Also, I strongly recommend to not copy all existing parameters in the YAML file but only set what you need (see e.g. the quickstart).

1 Like

I can add that I’ve been getting excellent results training a Transformer models using the quickstart with --auto_config.

My Model Folder was empty when I got this error. Still I got his error.

I am also running the training with --auto_config and using --model multi_feature transformer model .

onmt-main --model config/models/ --config data.yml --auto_config train --num_gpus 1

Can you post the complete logs if the error is still happening?

The error is resolved. We should not keep model folder empty before running with fresh model configurations.

WARNING:tensorflow:You provided a model configuration but a checkpoint already exists. The model configuration must define the same model as the one used for the initial training. However, you can change non structural values like dropout.

Thanks @guillaumekln