Indexed vocabularies issue

I want to use parallel input in the training. So, I also used indexed vocabularies based on
Also I use OpenNMT-tf version:2.8

but I get this error
> `Traceback (most recent call last):

  File "/usr/local/bin/onmt-main", line 8, in <module>
  File "/usr/local/lib/python3.6/dist-packages/opennmt/bin/", line 204, in main
  File "/usr/local/lib/python3.6/dist-packages/opennmt/", line 147, in train
    checkpoint, config = self._init_run(num_devices=num_devices, training=True)
  File "/usr/local/lib/python3.6/dist-packages/opennmt/", line 134, in _init_run
    return self._init_model(config), config
  File "/usr/local/lib/python3.6/dist-packages/opennmt/", line 120, in _init_model
    model.initialize(config["data"], params=config["params"])
  File "/usr/local/lib/python3.6/dist-packages/opennmt/models/", line 127, in initialize
    super(SequenceToSequence, self).initialize(data_config, params=params)
  File "/usr/local/lib/python3.6/dist-packages/opennmt/models/", line 86, in initialize
  File "/usr/local/lib/python3.6/dist-packages/opennmt/models/", line 426, in initialize
    super(SequenceToSequenceInputter, self).initialize(data_config, asset_prefix=asset_prefix)
  File "/usr/local/lib/python3.6/dist-packages/opennmt/inputters/", line 209, in initialize
    data_config, asset_prefix=_get_asset_prefix(asset_prefix, inputter, i))
  File "/usr/local/lib/python3.6/dist-packages/opennmt/inputters/", line 381, in initialize
    super(WordEmbedder, self).initialize(data_config, asset_prefix=asset_prefix)
  File "/usr/local/lib/python3.6/dist-packages/opennmt/inputters/", line 254, in initialize
    data_config, "vocabulary", prefix=asset_prefix, required=True)
  File "/usr/local/lib/python3.6/dist-packages/opennmt/inputters/", line 208, in _get_field
    raise ValueError("Missing field '%s' in the data configuration" % key)
ValueError: Missing field 'source_vocabulary' in the data configuration

**this is my yaml

model_dir: Deen_transformer
gpu_allow_growth: true
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
- data/tok/
eval_features_file: data/tok/
eval_labels_file: data/tok/test.en.tok
source_1_vocabulary: data/vocab/
source_2_vocabulary: data/vocab/
source_3_vocabulary: data/vocab/
source_4_vocabulary: data/vocab/
source_5_vocabulary: data/vocab/
source_6_vocabulary: data/vocab/
source_7_vocabulary: data/vocab/
target_1_vocabulary: data/vocab/
target_2_vocabulary: data/vocab/
target_3_vocabulary: data/vocab/
target_4_vocabulary: data/vocab/
target_5_vocabulary: data/vocab/
target_6_vocabulary: data/vocab/
target_7_vocabulary: data/vocab/



What is your model definition?

I use transformer

onmt-main --model_type Transformer
–config config/GMT_deen.yml --auto_config
train --with_eval

A multi-feature Transformer is not the same architecture as the default Transformer. You should provide a custom model definition to at least configure the embedding dimension of each feature and how they are merged.

See for example this model which defines 3 input features that are concatenated:

My mistake, sorry…
Thank you for quick response

Also note that target features are not supported.

thank you.
In this case should I build a one vocabulary target file from all my train_label_files?

You should only build the vocabulary for the actual target file.

Now that I read your YAML configuration file again, are those training files actually parallel input features? From the names, it looks like they are unrelated training files (WMT, TED, etc.).

Actually I think I misunderstood multi-feature.
I want to use parallel inputs and weight them for training. based on document
In the document said

Parallel inputs require indexed vocabularies

“parallel” means “aligned” in this context. Are your files actually aligned?

If not, maybe you are looking for weighted inputs? Here it is just about interleaving data coming from multiple datasets and it does not require a different model architecture nor multiple vocabularies.

That’s great,
the word “parallel” just confused me. :sweat_smile:
Thank you for your help

2 posts were split to a new topic: How to run dual source Transformer?