Configuring the LstmCnnCrfTagger with yaml

larsbun · May 11, 2023, 7:26am

I have trouble understanding how to configure the LstmCnnCrfTagger, whose initialization is defined as follows:

class LstmCnnCrfTagger(sequence_tagger.SequenceTagger):
    """Defines a bidirectional LSTM-CNNs-CRF as described in https://arxiv.org/abs/1603.01354."""

    def __init__(self):
        super().__init__(
            inputter=inputters.MixedInputter(
                [
                    inputters.WordEmbedder(embedding_size=100),
                    inputters.CharConvEmbedder(
                        embedding_size=30,
                        num_outputs=30,
                        kernel_size=3,
                        stride=1,
                        dropout=0.5,
                    ),
                ],
                dropout=0.5,
            ),
            encoder=encoders.RNNEncoder(
                num_layers=1,
                num_units=400,
                bidirectional=True,
                dropout=0.5,
                residual_connections=False,
                cell_class=tf.keras.layers.LSTMCell,
            ),
            crf_decoding=True,
        )

Specifically, I am wondering how to override the layers setting, as, in my experience with other frameworks, just one layer in encoding is insufficient to get good results for my downstream tasks. I could not find any examples in the Docs directory showing how, or where, to configure the number of layers. Thus, it is not clear to me if I need to define my own model, or if this variable is mutable at runtime.

Additionally, I would like to know if there is a way to confirm that the settings in the yaml file have an effect on the model, or not. Seemingly, as long as the yaml parses validly, the code executes, even if variable names are wrong, or in the wrong section, etc.

guillaumekln · May 11, 2023, 2:58pm

These parameters cannot be changed in the YAML file.

You should define a custom model definition as described here: Model — OpenNMT-tf 2.31.0 documentation

You can copy the class definition that you included in your post, but you should update some module paths to use the public paths instead (e.g. sequence_tagger.SequenceTagger → opennmt.models.SequenceTagger, inputters.MixedInputter → opennmt.inputters.MixedInputter, etc.)

larsbun · May 11, 2023, 8:19pm

Thanks, Guillaume, I actually got that working.

However, some aspects of the yaml control remain unclear to me. E.g., in the above-quoted definition, where the word embeddings are given size=100. In the file embeddings.md from the docs, it is explained how to use pre-trained embeddings:

data:
source_embedding:
path: data/glove/glove-100000.txt
with_header: True
case_insensitive: True
trainable: False

But how does that relate to the definition of the model? Is the embedding size given a new value if the pre-trained embeddings have a different dimension?

guillaumekln · May 12, 2023, 6:22am

If you are using pre trained embeddings you can remove embedding_size=100. This is mentioned in the documentation of the inputter: WordEmbedder — OpenNMT-tf 2.31.0 documentation