Hello Fellow Researchers,
I am trying to run onmt with multiple sources and multiple features. So I want to encode two different sources. For one of the sources, I need to concatenate all features and encode them with one encoder. I am trying to write a custom model custom_model.py like below,
from opennmt import models, inputters, encoders, layers, decoders
def model():
return models.SequenceToSequence(
source_inputter=inputters.ParallelInputter(
[inputters.ParallelInputter(
[inputters.WordEmbedder(embedding_size=3),
inputters.WordEmbedder(embedding_size=23),
inputters.WordEmbedder(embedding_size=43),
inputters.WordEmbedder(embedding_size=3),
inputters.WordEmbedder(embedding_size=100)
], combine_features=True, reducer=layers.ConcatReducer()), # Combine 5 features
inputters.WordEmbedder(embedding_size=300)]
),
target_inputter=inputters.WordEmbedder(
embedding_size=300),
encoder=encoders.ParallelEncoder([
encoders.RNNEncoder(1, 300, dropout=0.2),
encoders.RNNEncoder(2, 300, dropout=0.2, bidirectional=True)],
outputs_reducer=layers.ConcatReducer(axis=-1)),
decoder=decoders.AttentionalRNNDecoder(
num_layers=1,
num_units=300,
dropout=0.2))
And the config.yml looks like below,
model_dir: model/
data:
train_features_file:
- train_src_1_1.txt
- train_src_1_2.txt
- train_src_1_3.txt
- train_src_1_4.txt
- train_src_1_5.txt
- train_src_2.txt
train_labels_file: train-tgt.txt
source_1_1_vocabulary: src_1_1_vocab.txt
source_1_2_vocabulary: src_1_2_vocab.txt
source_1_3_vocabulary: src_1_3_vocab.txt
source_1_4_vocabulary: src_1_4_vocab.txt
source_1_5_vocabulary: src_1_5_vocab.txt
source_2_vocabulary: src_2_vocab.txt
eval_features_file:
- dev_src_1_1.txt
- dev_src_1_2.txt
- dev_src_1_3.txt
- dev_src_1_4.txt
- dev_src_1_5.txt
- dev_src_2.txt
eval_labels_file: dev-tgt.txt
target_vocabulary: tgt-vocab.txt
params: ....
....
the command to run is,
onmt-main --model custom_model.py --config config/config.yml --auto_config train --with_eval
The error is
Traceback (most recent call last):
File “/anaconda3/envs/OpenNMT-tf/bin/onmt-main”, line 8, in
sys.exit(main())
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/bin/main.py”, line 204, in main
checkpoint_path=args.checkpoint_path)
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/runner.py”, line 180, in train
evaluator = evaluation.Evaluator.from_config(model, config)
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/evaluation.py”, line 166, in from_config
exporter=exporters.make_exporter(eval_config.get(“export_format”, “saved_model”)))
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/evaluation.py”, line 99, in init
prefetch_buffer_size=1)
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/inputters/inputter.py”, line 491, in make_evaluation_dataset
dataset = self.make_dataset([features_file, labels_file], training=False)
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py”, line 431, in make_dataset
data_file, training=training)
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/inputters/inputter.py”, line 274, in make_dataset
dataset = inputter.make_dataset(data, training=training)
File “/anaconda3/envs/OpenNMT-tf/lib/python3.7/site-packages/opennmt/inputters/inputter.py”, line 269, in make_dataset
raise ValueError(“The number of data files must be the same as the number of inputters”)
ValueError: The number of data files must be the same as the number of inputters
And the error is because I have 2 inputters (one parallel and one wordembedder) and 6 data files which don’t match. My question is does onmt supports such kind of model and if not, what classed (encoder or inputter or both) I should overwrite to have this work? I am using the OpenNMT-tf==2.8
Thanks and best regards