The number of data files must be the same as the number of inputters

mayaKaplansky · February 1, 2021, 10:40pm

Hi I get this error, it seems I am not creating the config file correctly in the notebook:

config = {
    "model_dir": "runFull/",
    "data": {

        "train_features_file":
            "- OpenNMTSrc.txt"
            "  - AgeGroup.txt"
            " - Gender.txt"
             "- JobGroup.txt"
             "- PatientLocation.txt"
        ,
        "train_labels_file": "OpenNMTTgt.txt",
        "source_1_vocabulary": "vocabSrcFull.txt",
        "source_2_vocabulary": "vocabAgeGroup.txt",
        "source_3_vocabulary": "vocabGender.txt",
        "source_4_vocabulary": "vocabJobGroup.txt",
        "source_5_vocabulary": "vocabPatientLocation.txt",
        "target_vocabulary": "vocabTargetFull.txt",
        "eval_features_file":
            "- OpenNMTSrc.txt"
            "  - AgeGroup.txt"
            " - Gender.txt"
             "- JobGroup.txt"
             "- PatientLocation.txt"
        ,
        "eval_labels_file": "OpenNMTTgt.txt",
        "sequence_controls": {
            "start": "true",
            "end": "true",
        },
    },
    "params":{
        "beam_width": 5,
    },
    "train":{
        "batch_size": 3064,
        "batch_type": "examples",
        "max_step": 10000,
        "save_checkpoints_steps" : 5000,
        "keep_checkpoint_max": 10,
        "save_summary_steps": 200,
    },
    "eval":{
        "batch_size": 3064,
        "batch_type": "examples",
        "steps": 200,
        "export_on_best": "loss",
        "export_format": "saved_model",
        "max_exports_to_keep:": 5,
        "early_stopping":{
            "metric": "loss",
            "min_improvement": 0.01,
            "steps": 4,
        },
    },
    "infer":{
        "n_best": 5,
        "with_scores": "true",
    }
}

This is the model:

model = onmt.models.Transformer(
    source_inputter=onmt.inputters.ParallelInputter(
            [
                onmt.inputters.WordEmbedder(embedding_size=512),
                onmt.inputters.WordEmbedder(embedding_size=16),
                onmt.inputters.WordEmbedder(embedding_size=16),
                onmt.inputters.WordEmbedder(embedding_size=16),
                onmt.inputters.WordEmbedder(embedding_size=16),
                            ],
            reducer=onmt.layers.ConcatReducer(),
    ),
    target_inputter=onmt.inputters.WordEmbedder(embedding_size=512),
    num_layers=6,
    num_units=512,
    num_heads=8,
    ffn_inner_dim=2048,
    dropout=0.1,
    attention_dropout=0.1,
    ffn_dropout=0.1,
)

Thanks

guillaumekln · February 2, 2021, 8:26am

You are trying to use the YAML list syntax in Python.

A Python list is like:

"train_features_file": ["OpenNMTSrc.txt", "AgeGroup.txt", "Gender.txt"]

mayaKaplansky · February 2, 2021, 8:33am

Thank you for that! I managed to fix it, but now I get: “ValueError: No optimizer is defined”. I thought that any parameter that I don’t define myself fall back to default. Do I need to define all of them?

guillaumekln · February 2, 2021, 1:42pm

A default optimizer is set when using auto_config. If you don’t enable automatic configuration, you should provide such parameters.