ValueError: Missing field 'source_vocabulary' in the data configuration

shan778 · July 20, 2020, 4:15pm

I want to translate using a trained model provided on WMT English-German dataset on opennmt-tf script.
On there prepare_data.sh script created a single vocabulary file and wmt-ende-sp folder there aren’t any vocabulary file for source and target language.
Therefore after putting this command:

onmt-main --config data.yml --model_type Transformer --auto_config infer --features_file newstest2017-ende-src.en --predictions_file predict_1.txt

it’s showing this error

guillaumekln · July 21, 2020, 7:48am

This script generates a shared vocabulary.

See the example configuration to see how it is configured:

github.com

OpenNMT/OpenNMT-tf/blob/master/scripts/wmt/config/wmt_ende.yml

model_dir: wmt_ende_transformer

data:
  train_features_file: data/train.en
  train_labels_file: data/train.de
  eval_features_file: data/valid.en
  eval_labels_file: data/valid.de
  source_vocabulary: data/wmtende.vocab
  target_vocabulary: data/wmtende.vocab

train:
  save_checkpoints_steps: 1000

eval:
  external_evaluators: BLEU

infer:
  batch_size: 32