I ran with your script, but the Perplexity suddenly began to grow, and then I stopped and ran -continue, but it soon became big again.Do you know the reasons?
Is there an easy way to explore a pretrained model – for example getting the model configuration (number of layers for encoder, decoder, bidirectional or not) as well as weights of all the Linear layers for Encoder, Decoder and Attention? Thanks for the help in advance.
You may to take a look at the
Model configurations can be displayed with a simple:
and the function
releaseModel traverses parts of model (e.g. the encoder). With some