ValueError: Missing field 'vocabulary' in the data configuration in V2.5.0

tel34 · January 19, 2020, 4:27pm

I just tried to train with V2.5.0 and training crashed with the message above. Has the structure of the configuration file changed? The relevant part of my config file reads:
data:
train_features_file:
/home/miguel/tf_experiments/eng2turk_tf/data/turkish_medium_sp.en_train.txt
train_labels_file:
/home/miguel/tf_experiments/eng2turk_tf/data/turkish_medium_sp.tr_train.txt
eval_features_file:
/home/miguel/tf_experiments/eng2turk_tf/data/turkish_medium_sp.en_validation.txt
eval_labels_file:
/home/miguel/tf_experiments/eng2turk_tf/data/turkish_medium_sp.tr_validation.txt
source_vocabulary: /home/miguel/tf_experiments/eng2turk_tf/data/src-vocab.txt
target_vocabulary: /home/miguel/tf_experiments/eng2turk_tf/data/tgt-vocab.txt

guillaumekln · January 19, 2020, 7:15pm

No it hasn’t changed, unless a bug was introduced. Can you post your command line and the full error log?

tel34 · January 20, 2020, 1:33am

Hi Guillaume,
Command line:
onmt-main --model_type Transformer --config ./data.yml --auto_config train --with_eval
Full error log
Traceback (most recent call last):
File “/home/miguel/tf2_env/bin/onmt-main”, line 8, in
sys.exit(main())
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/bin/main.py”, line 188, in main
checkpoint_path=args.checkpoint_path)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/runner.py”, line 147, in train
checkpoint, config = self._init_run(num_devices=num_devices, training=True)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/runner.py”, line 134, in _init_run
return self._init_model(config), config
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/runner.py”, line 120, in _init_model
model.initialize(config[“data”], params=config[“params”])
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/models/language_model.py”, line 46, in initialize
super(LanguageModel, self).initialize(data_config, params=params)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/models/model.py”, line 81, in initialize
self.examples_inputter.initialize(data_config)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/models/language_model.py”, line 145, in initialize
super(LanguageModelInputter, self).initialize(data_config, asset_prefix=asset_prefix)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/inputters/text_inputter.py”, line 381, in initialize
super(WordEmbedder, self).initialize(data_config, asset_prefix=asset_prefix)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/inputters/text_inputter.py”, line 253, in initialize
data_config, “vocabulary”, prefix=asset_prefix, required=True)
File “/home/miguel/tf2_env/lib/python3.5/site-packages/opennmt/inputters/text_inputter.py”, line 207, in _get_field
raise ValueError(“Missing field ‘%s’ in the data configuration” % key)
ValueError: Missing field ‘vocabulary’ in the data configuration

Thanks,. Terence

guillaumekln · January 20, 2020, 9:14am

For some reasons it tries to load a LanguageModel instance. Are you training from scratch? Is it possible that you initially run the training command with a language model type?

tel34 · January 20, 2020, 9:30am

No, this training is from scratch. My config_file simply contains the data and I am relying on the auto_config to provide the rest here.

guillaumekln · January 20, 2020, 9:48am

Can you make sure the model_dir directory is empty before starting the training?

tel34 · January 20, 2020, 10:09am

Have completely emptied the model directory and no longer have that error. I have never used a language model in OpenNMT-tf so am puzzled!