Evaluation error during training

lightbluetree · December 2, 2021, 8:34am

Hi,
I’m using the openNMT Tiny Transformer architecture to train a model. In the evaluation process during training, I got an unexpected error, that I reported below. I’m using OpenNMT-tf version 2.23.0

Running evaluation for step 5000
Traceback (most recent call last):
File “/usr/local/bin/onmt-main”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.8/dist-packages/opennmt/bin/main.py”, line 308, in main
runner.train(
File “/usr/local/lib/python3.8/dist-packages/opennmt/runner.py”, line 276, in train
summary = trainer(
File “/usr/local/lib/python3.8/dist-packages/opennmt/training.py”, line 134, in call
early_stop = self._evaluate(
File “/usr/local/lib/python3.8/dist-packages/opennmt/training.py”, line 192, in _evaluate
evaluator(step)
File “/usr/local/lib/python3.8/dist-packages/opennmt/evaluation.py”, line 319, in call
loss, predictions = self._eval_fn(source, target)
File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py”, line 153, in error_handler raise e.with_traceback(filtered_tb) from None
File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py”, line 1129, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

File "/usr/local/lib/python3.8/dist-packages/opennmt/models/model.py", line 162, in evaluate  *
    outputs, predictions = self(features, labels=labels)
File "/usr/local/lib/python3.8/dist-packages/opennmt/models/model.py", line 102, in __call__  *
    outputs, predictions = super().__call__(
File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler  **
    raise e.with_traceback(filtered_tb) from None

ValueError: Exception encountered when calling layer "transformer_tiny_1" (type TransformerTiny).

in user code:

    File "/usr/local/lib/python3.8/dist-packages/opennmt/models/sequence_to_sequence.py", line 181, in call  *
        predictions = self._dynamic_decode(
    File "/usr/local/lib/python3.8/dist-packages/opennmt/models/sequence_to_sequence.py", line 380, in _dynamic_decode  *
        target_tokens, sampled_length = _add_noise(
    File "/usr/local/lib/python3.8/dist-packages/opennmt/models/sequence_to_sequence.py", line 640, in _add_noise  *
        return noiser(tokens, lengths, keep_shape=True)
    File "/usr/local/lib/python3.8/dist-packages/opennmt/data/noise.py", line 69, in __call__  *
        return self._call(tokens, sequence_length, keep_shape, probability)
    File "/usr/local/lib/python3.8/dist-packages/opennmt/data/noise.py", line 87, in _call  *
        raise ValueError("unsupported rank %d for WordNoiser input" % input_rank)

    ValueError: unsupported rank 3 for WordNoiser input


Call arguments received:
  • features={'length': 'tf.Tensor(shape=(None,), dtype=int32)', 'tokens': 'tf.Tensor(shape=(None, None), dtype=string)', 'ids': 'tf.Tensor(shape=(None, None), dtype=int64)', 'index': 'tf.Tensor(shape=(None,), dtype=int64)'}
  • labels={'length': 'tf.Tensor(shape=(None,), dtype=int32)', 'tokens': 'tf.Tensor(shape=(None, None), dtype=string)', 'ids': 'tf.Tensor(shape=(None, None), dtype=int64)', 'ids_out': 'tf.Tensor(shape=(None, None), dtype=int64)'}
  • training=None
  • step=None

guillaumekln · December 2, 2021, 8:41am

Hi,

Can you post the configuration you are using?

lightbluetree · December 2, 2021, 9:29am

This is the configuration file

model_dir: exp-folder

data:
  # (required for train run type).
  train_features_file: exp-folder/data/src-train-lim.txt
  train_labels_file:  exp-folder/data/tgt-train-lim.txt

  # (required for train_end_eval and eval run types).
  eval_features_file: exp-folder/data/src-val-lim.txt
  eval_labels_file: exp-folder/data/tgt-val-lim.txt

  # (optional) Models may require additional resource files (e.g. vocabularies).
  source_vocabulary: exp-folder/data/src-vocab-lim.txt
  target_vocabulary: exp-folder/data/tgt-vocab-lim.txt

  # (optional) During export save the vocabularies as model assets, otherwise embed
  # them in the graph itself (default: True).
  export_vocabulary_assets: True


# Model and optimization parameters.
params:
  optimizer: Adam
  optimizer_params:
    beta_1: 0.8
    beta_2: 0.998
  learning_rate: 1.0

  # (optional) If set, overrides all dropout values configured in the model definition.
  dropout: 0.3

  # (optional) Weights regularization penalty (default: null).
  regularization:
    type: l2  # can be "l1", "l2", "l1_l2" (case-insensitive).
    scale: 1e-4  # if using "l1_l2" regularization, this should be a YAML list.

  # (optional) Average loss in the time dimension in addition to the batch dimension
  # (default: true when using "tokens" batch type, false otherwise).
  average_loss_in_time: false

  decay_type: NoamDecay
  decay_params:
    model_dim: 512
    warmup_steps: 4000
  # (optional) The number of training steps that make 1 decay step (default: 1).
  decay_step_duration: 1
  # (optional) After how many steps to start the decay (default: 0).
  start_decay_steps: 50000

  # (optional) The learning rate minimum value (default: 0).
  minimum_learning_rate: 0.0001

  # (optional) The label smoothing value.
  label_smoothing: 0.1

  # (optional) Width of the beam search (default: 1).
  beam_width: 5
  # (optional) Number of hypotheses to return (default: 1). Set 0 to return all
  # available hypotheses. This value is also set by infer/n_best.
  num_hypotheses: 1
  # (optional) Length penaly weight to use during beam search (default: 0).
  length_penalty: 0.2
  # (optional) Coverage penaly weight to use during beam search (default: 0).
  coverage_penalty: 0.2
  # (optional) Sample predictions from the top K most likely tokens (requires
  # beam_width to 1). If 0, sample from the full output distribution (default: 1).
  sampling_topk: 1
  # (optional) High temperatures generate more random samples (default: 1).
  sampling_temperature: 1
  # (optional) Sequence of noise to apply to the decoding output. Each element
  # should be a noise type (can be: "dropout", "replacement", "permutation") and
  # the module arguments
  # (see https://opennmt.net/OpenNMT-tf/package/opennmt.data.noise.html)
  decoding_noise:
    - dropout: 0.1
    - replacement: [0.1, ｟unk｠]
    - permutation: 3
# (optional) Define the subword marker. This is useful to apply noise at the
  # word level instead of the subword level (default: ￭).
  decoding_subword_token: ￭
  # (optional) Whether decoding_subword_token is used as a spacer (as in SentencePiece)
  # or a joiner (as in BPE).
  # If unspecified, will infer  directly from decoding_subword_token.
  decoding_subword_token_is_spacer: false
  # (optional) Minimum length of decoded sequences, end token excluded (default: 0).
  minimum_decoding_length: 0
  # (optional) Maximum length of decoded sequences, end token excluded (default: 250).
  maximum_decoding_length: 250

  # (optional) Replace unknown target tokens by the original source token with the
  # highest attention (default: false).
  replace_unknown_target: false

  # (optional) The type of guided alignment cost to compute (can be: "null", "ce", "mse",
  # default: "null").
  guided_alignment_type: null
  # (optional) The weight of the guided alignment cost (default: 1).
  guided_alignment_weight: 1

  # (optional) Enable contrastive learning mode, see
  # https://www.aclweb.org/anthology/P19-1623 (default: false).
  # See also "decoding_subword_token" that is used by this mode.
  contrastive_learning: false
  # (optional) The value of the parameter eta in the max-margin loss (default: 0.1).
  max_margin_eta: 0.1
# Training options.
train:
  # (optional when batch_type=tokens) If not set, the training will search the largest
  # possible batch size.
  batch_size: 16
  # (optional) Batch size is the number of "examples" or "tokens" (default: "examples").
  batch_type: examples
  # (optional) Tune gradient accumulation to train with at least this effective batch size
  # (default: null).
  effective_batch_size: 2500

  # (optional) Save a checkpoint every this many steps (default: 5000).
  save_checkpoints_steps: null
  # (optional) How many checkpoints to keep on disk.
  keep_checkpoint_max: 3

  # (optional) Dump summaries and logs every this many steps (default: 100).
  save_summary_steps: 100

  # (optional) Maximum training step. If not set, train forever.
  max_step: 1000000
  # (optional) If true, makes a single pass over the training data (default: false).
  single_pass: false

  # (optional) The maximum length of feature sequences during training (default: null).
  maximum_features_length: 70
  # (optional) The maximum length of label sequences during training (default: null).
  maximum_labels_length: 70

  # (optional) The width of the length buckets to select batch candidates from.
  # A smaller value means less padding and increased efficiency. (default: 1).
  length_bucket_width: 1
  # (optional) The number of elements from which to sample during shuffling (default: 500000).
  # Set 0 or null to disable shuffling, -1 to match the number of training examples.
  sample_buffer_size: 500000

  # (optional) Moving average decay. Reasonable values are close to 1, e.g. 0.9999, see
  # https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
  # (default: null)
  moving_average_decay: 0.9999
  # (optional) Number of checkpoints to average at the end of the training to the directory
  # model_dir/avg (default: 0).
  average_last_checkpoints: 8

# (optional) Evaluation options.
eval:
  # (optional) The batch size to use (default: 32).
  batch_size: 30
  # (optional) Batch size is the number of "examples" or "tokens" (default: "examples").
  batch_type: examples

  # (optional) Evaluate every this many steps (default: 5000).
  steps: 5000

  # (optional) Save evaluation predictions in model_dir/eval/.
  save_eval_predictions: false
  # (optional) Evalutator or list of evaluators that are called on the saved evaluation
  # predictions.
  # Available evaluators: bleu, rouge
  external_evaluators: bleu

  # (optional) The width of the length buckets to select batch candidates from.
  # If set, the eval data will be sorted by length to increase the translation
  # efficiency. The predictions will still be outputted in order as they are
  # available (default: 0).
  length_bucket_width: 5

 # (optional) Export a model when a metric has the best value so far (default: null).
  export_on_best: bleu
  # (optional) Format of the exported model (can be: "saved_model, "checkpoint",
  # "ctranslate2", "ctranslate2_int8", "ctranslate2_int16", "ctranslate2_float16",
  # default: "saved_model").
  export_format: saved_model
  # (optional) Maximum number of exports to keep on disk (default: 5).
  max_exports_to_keep: 5

  # (optional) Early stopping condition.
  # Should be read as: stop the training if "metric" did not improve more
  # than "min_improvement" in the last "steps" evaluations.
  early_stopping:
    # (optional) The target metric name (default: "loss").
    metric: bleu
    # (optional) The metric should improve at least by this much to be considered
    # as an improvement (default: 0)
    min_improvement: 0.01
    steps: 4

# (optional) Inference options.
infer:
  # (optional) The batch size to use (default: 16).
  batch_size: 10
  # (optional) Batch size is the number of "examples" or "tokens" (default: "examples").
  batch_type: examples

  # (optional) For compatible models, the number of hypotheses to output (default: 1).
  # This sets the parameter params/num_hypotheses.
  n_best: 1
  # (optional) For compatible models, also output the score (default: false).
  with_scores: false
  # (optional) For compatible models, also output the alignments
  # (can be: null, hard, soft, default: null).
  with_alignments: null	
  # (optional) The width of the length buckets to select batch candidates from.
  # If set, the test data will be sorted by length to increase the translation
  # efficiency. The predictions will still be outputted in order as they are
  # available (default: 0).
  length_bucket_width: 5

# (optional) Scoring options.
score:
  # (optional) The batch size to use (default: 64).
  batch_size: 64
  # (optional) Also report token-level cross entropy.
  with_token_level: false
  # (optional) Also output the alignments (can be: null, hard, soft, default: null).
  with_alignments: null

guillaumekln · December 2, 2021, 9:32am

You missed this warning in the documentation:

You should NOT copy and use this configuration, instead you should only define the parameters that you need.

So I suggest starting from a minimal configuration and add options as needed:

model_dir: exp-folder

data:
  train_features_file: exp-folder/data/src-train-lim.txt
  train_labels_file:  exp-folder/data/tgt-train-lim.txt
  eval_features_file: exp-folder/data/src-val-lim.txt
  eval_labels_file: exp-folder/data/tgt-val-lim.txt
  source_vocabulary: exp-folder/data/src-vocab-lim.txt
  target_vocabulary: exp-folder/data/tgt-vocab-lim.txt

lightbluetree · December 2, 2021, 9:38am

Thanks, I’ll try with a minimal configuration