Changing custon model's hyperparameters via CLI/YAML

When using a custom model, by using --model, can you still override properties such as number of layers via the command line? As an example, if you use this custom model, is it possible to have for instance the num_layers to be customizable via the command line or the YAML configuration file? (I have not worked with the tensorflow version before, but it seems that YAML is required and command line equivalents are not allowed.)

import opennmt

class MyCustomTransformer(opennmt.models.Transformer):
    def __init__(self):
        super().__init__(
            source_inputter=opennmt.inputters.WordEmbedder(embedding_size=512),
            target_inputter=opennmt.inputters.WordEmbedder(embedding_size=512),
            num_layers=6,
            num_units=512,
            num_heads=8,
            ffn_inner_dim=2048,
            dropout=0.1,
            attention_dropout=0.1,
            ffn_dropout=0.1,
            share_embeddings=opennmt.models.EmbeddingsSharingLevel.ALL)

No, this is not possible.

All parameters that change the model structure should be defined in the Python model file. The main reason is that these parameters depend on the model you are training, and there may be multiple use of num_layers in the model structure.

If you prefer using the command line, you could still write a small client script (similar to opennmt/bin/main.py) that exposes these kind of parameters.

Wouldn’t that imply that this should also not be possible in the PyTorch version? In the PT version one can simply pass --layers 6, which is convenient on the one hand but also cumbersome because you end up with very long commands. But it seems that the ambiguity that you refer to is not a problem there.

I guess the ideal is to combine the best of both worlds: allow for a config file, inline arguments, or both. When both are given, the inline arguments override the config file.

The difference is that OpenNMT-tf supports combining multiple encoders in sequence, in parallel, or even in a nested structure. In this case it’s not possible to clearly expose command line options.

I agree with you that for basic models, the command line is simpler but with OpenNMT-tf we chose to go in a highly modular design that makes is more difficult. On the other hand, standard models (such as TransformerBase) are available on the command line without requiring any parameters.

That makes sense. A possible, but intricate, solution could be to allow for the naming of encoders and read in specific configurations for each encoder. But that’s just me day-dreaming.

Thanks again!

1 Like