ValueError: Tensor’s shape is not compatible with supplied shape

ishaansharma · December 14, 2019, 10:18am

Hello Fellow researchers,
Greetings ,

I tried to run multifeature with downsizing the embedding from 512 to 128 but I got an error stated below:

Traceback (most recent call last):
File “/opt/conda/envs/tf_v2_env/bin/onmt-main”, line 8, in
sys.exit(main())
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/bin/main.py”, line 189, in main
checkpoint_path=args.checkpoint_path)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/runner.py”, line 198, in train
mixed_precision=self._mixed_precision)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/training.py”, line 44, in init
self._model.create_variables()
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/model.py”, line 286, in create_variables
_ = self(features, labels=labels, training=True, step=0)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/model.py”, line 95, in call
return super(Model, self).call(*args, **kwargs)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py”, line 887, in call
self._maybe_build(inputs)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py”, line 2141, in _maybe_build
self.build(input_shapes)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py”, line 141, in build
super(SequenceToSequence, self).build(input_shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/models/model.py”, line 89, in build
self.examples_inputter.build(input_shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/inputters/inputter.py”, line 342, in build
inputter.build(input_shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/inputters/inputter.py”, line 342, in build
inputter.build(input_shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/opennmt/inputters/text_inputter.py”, line 416, in build
trainable=self.trainable)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py”, line 522, in add_weight
aggregation=aggregation)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py”, line 725, in _add_variable_with_custom_getter
name=name, shape=shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py”, line 792, in _preload_simple_restoration
checkpoint_position=checkpoint_position, shape=shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py”, line 75, in init
self.wrapped_value.set_shape(shape)
File “/opt/conda/envs/tf_v2_env/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py”, line 1074, in set_shape
(self.shape, shape))
ValueError: Tensor’s shape (94949, 512) is not compatible with supplied shape [94949, 128]

my model for multi_features_transformer’s function is (Indents are proper):

def model():
return onmt.models.Transformer(
source_inputter=onmt.inputters.ParallelInputter([
onmt.inputters.WordEmbedder(embedding_size=128),
onmt.inputters.WordEmbedder(embedding_size=16),
onmt.inputters.WordEmbedder(embedding_size=64)],
reducer=onmt.layers.ConcatReducer()),
target_inputter=onmt.inputters.WordEmbedder(embedding_size=128),
num_layers=6,
num_units=128,
num_heads=8,
ffn_inner_dim=2048,
dropout=0.1,
attention_dropout=0.1,
ffn_dropout=0.1)

Please help me resolve this issue. what is going wrong when I change the embedding size from 512 to 128

Thanks

ishaansharma · December 14, 2019, 10:39am

This is my yml file.

model_dir: folder/run/

data:
  train_features_file:
    - folder/f1
    - folder/f2
    - folder/f2
  train_labels_file: folder/t1
  source_1_vocabulary: folder/f1_vocab
  source_2_vocabulary: folder/f2_vocab
  source_3_vocabulary: folder/f3_vocab
  target_vocabulary: folder/t1_vocab

  # (required for train_end_eval and eval run types).
  eval_features_file: 
    - folder/f1_val
    - folder/f2_val
    - folder/f3_val
  eval_labels_file: folder/t1_val


  # (optional) For language models, configure sequence control tokens (usually
  # represented as <s> and </s>). For example, enabling "start" and disabling "end"
  # allows nonconditional and unbounded generation (default: start=false, end=true).
  #
  # Advanced users could also configure this parameter for seq2seq models with e.g.
  # source_sequence_controls and target_sequence_controls.
  sequence_controls:
    start: true
    end: true

# Model and optimization parameters.
params:
  # The optimizer class name in tf.keras.optimizers or tfa.optimizers.
  optimizer: Adam
  # (optional) Additional optimizer parameters as defined in their documentation.
  # If weight_decay is set, the optimizer will be extended with decoupled weight decay.
  optimizer_params:
    beta_1: 0.8
    beta_2: 0.998
  learning_rate: 1.0

  # (optional) If set, overrides all dropout values configured in the model definition.
  dropout: 0.1

  # (optional) List of layer to not optimize.
  freeze_layers:
    - "encoder/layers/0"
    - "decoder/output_layer"

  # (optional) Weights regularization penalty (default: null).
  regularization:
    type: l2  # can be "l1", "l2", "l1_l2" (case-insensitive).
    scale: 1e-4  # if using "l1_l2" regularization, this should be a YAML list.

  # (optional) Average loss in the time dimension in addition to the batch dimension
  # (default: true when using "tokens" batch type, false otherwise).
  average_loss_in_time: false

  # (optional) The type of learning rate decay (default: null). See:
  #  * https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules
  #  * opennmt/schedules/lr_schedules.py
  # This value may change the semantics of other decay options. See the documentation or the code.
  decay_type: NoamDecay
  # (optional unless decay_type is set) Decay parameters.
  decay_params:
    model_dim: 128
    warmup_steps: 100

  # (optional) The learning rate minimum value (default: 0).
  minimum_learning_rate: 0.00001

  label_smoothing: 0.1

  # (optional) Width of the beam search (default: 1).
  beam_width: 5
  # (optional) Number of hypotheses to return (default: 1). Set 0 to return all
  # available hypotheses. This value is also set by infer/n_best.
  num_hypotheses: 1
  # (optional) Length penaly weight to use during beam search (default: 0).
  length_penalty: 0.2
  # (optional) Coverage penaly weight to use during beam search (default: 0).
  coverage_penalty: 0.2
  # (optional) Sample predictions from the top K most likely tokens (requires
  # beam_width to 1). If 0, sample from the full output distribution (default: 1).
  sampling_topk: 1
  # (optional) High temperatures generate more random samples (default: 1).
  sampling_temperature: 1
  # (optional) Sequence of noise to apply to the decoding output. Each element
  # should be a noise type (can be: "dropout", "replacement", "permutation") and
  # the module arguments
  # (see http://opennmt.net/OpenNMT-tf/package/opennmt.data.noise.html)
  decoding_noise:
    - dropout: 0.1
    - replacement: [0.1, ｟unk｠]
    - permutation: 3
  # (optional) Define the subword marker. This is useful to apply noise at the
  # word level instead of the subword level (default: ￭).

  # (optional) Replace unknown target tokens by the original source token with the
  # highest attention (default: false).
  replace_unknown_target: true

  # (optional) The type of guided alignment cost to compute (can be: "null", "ce", "mse",
  # default: "null").
  guided_alignment_type: mse
  # (optional) The weight of the guided alignment cost (default: 1).
  guided_alignment_weight: 1

  # (optional) Enable contrastive learning mode, see
  # https://www.aclweb.org/anthology/P19-1623 (default: false).
  # See also "decoding_subword_token" that is used by this mode.
  contrastive_learning: false
  # (optional) The value of the parameter eta in the max-margin loss (default: 0.1).
  max_margin_eta: 0.1

# Training options.
train:
  # (optional when batch_type=tokens) If not set, the training will search the largest
  # possible batch size.
  batch_size: 64
  # (optional) Batch size is the number of "examples" or "tokens" (default: "examples").
  batch_type: tokens
  # (optional) Tune gradient accumulation to train with at least this effective batch size
  # (default: null).
  effective_batch_size: 25000

  # (optional) Save a checkpoint every this many steps (default: 5000).
  save_checkpoints_steps: 10000
  # (optional) How many checkpoints to keep on disk.
  keep_checkpoint_max: 3

  # (optional) Dump summaries and logs every this many steps (default: 100).
  save_summary_steps: 100

  # (optional) Maximum training step. If not set, train forever.
  max_step: 1000000
  # (optional) If true, makes a single pass over the training data (default: false).
  single_pass: false

  # (optional) The maximum length of feature sequences during training (default: null).
  maximum_features_length: 70
  # (optional) The maximum length of label sequences during training (default: null).
  maximum_labels_length: 70

  # (optional) The width of the length buckets to select batch candidates from.
  # A smaller value means less padding and increased efficiency. (default: 1).
  length_bucket_width: 1

  # (optional) The number of elements from which to sample during shuffling (default: 500000).
  # Set 0 or null to disable shuffling, -1 to match the number of training examples.
  sample_buffer_size: 500000

  # (optional) Number of checkpoints to average at the end of the training to the directory
  # model_dir/avg (default: 0).
  average_last_checkpoints: 8

# (optional) Evaluation options.
eval:
  # (optional) The batch size to use (default: 32).
  batch_size: 30

  # (optional) Evaluate every this many steps (default: 5000).
  steps: 5000

  # (optional) Save evaluation predictions in model_dir/eval/.
  save_eval_predictions: false
  # (optional) Evalutator or list of evaluators that are called on the saved evaluation predictions.
  # Available evaluators: bleu, rouge
  external_evaluators: bleu

  # (optional) Export a SavedModel when a metric has the best value so far (default: null).
  export_on_best: bleu

  # (optional) Early stopping condition.
  # Should be read as: stop the training if "metric" did not improve more
  # than "min_improvement" in the last "steps" evaluations.
  early_stopping:
    # (optional) The target metric name (default: "loss").
    metric: bleu
    # (optional) The metric should improve at least by this much to be considered as an improvement (default: 0)
    min_improvement: 0.01
    steps: 4

# (optional) Inference options.
infer:
  # (optional) The batch size to use (default: 1).
  batch_size: 10

  # (optional) For compatible models, the number of hypotheses to output (default: 1).
  # This sets the parameter params/num_hypotheses.
  n_best: 1
  # (optional) For compatible models, also output the score (default: false).
  with_scores: false
  # (optional) For compatible models, also output the alignments (can be: null, hard, soft,
  # default: null).
  with_alignments: null

  # (optional) The width of the length buckets to select batch candidates from.
  # If set, the test data will be sorted by length to increase the translation
  # efficiency. The predictions will still be outputted in order as they are
  # available (default: 0).
  length_bucket_width: 5

# (optional) Scoring options.
score:
  # (optional) The batch size to use (default: 64).
  batch_size: 64
  # (optional) Also report token-level cross entropy.
  with_token_level: false
  # (optional) Also output the alignments (can be: null, hard, soft, default: null).
  with_alignments: null

guillaumekln · December 14, 2019, 12:08pm

Most likely you have a checkpoint in folder/run with the previous size of 512. You need to remove it or change model_dir.

Also, I strongly recommend to not copy all existing parameters in the YAML file but only set what you need (see e.g. the quickstart).

tel34 · December 14, 2019, 1:39pm

I can add that I’ve been getting excellent results training a Transformer models using the quickstart with --auto_config.

ishaansharma · December 16, 2019, 5:14am

My Model Folder was empty when I got this error. Still I got his error.
Thanks.

ishaansharma · December 16, 2019, 5:19am

I am also running the training with --auto_config and using --model multi_feature transformer model .

onmt-main --model config/models/multi_features_transformer.py --config data.yml --auto_config train --num_gpus 1

guillaumekln · December 17, 2019, 2:01pm

Can you post the complete logs if the error is still happening?

ishaansharma · December 18, 2019, 5:36am

The error is resolved. We should not keep model folder empty before running with fresh model configurations.

WARNING:tensorflow:You provided a model configuration but a checkpoint already exists. The model configuration must define the same model as the one used for the initial training. However, you can change non structural values like dropout.

Thanks @guillaumekln