[Code Understanding] Where are different models 'used' in the source?

(May) #1


As a disclaimer, I am relatively new to ML, NLP and OpenNMT, but I of course have a strong desire to learn. I’ve learned about LSTMs, and now want to peer into OpenNMT’s implementation of them.

I am interested in understanding, from an implementation perspective, how the underlying source uses different models during things like training. In particular, I am interested in the LSTM model being used (the specifics of it, as the documentation doesn’t give much details) and how it is implemented (and then used).

I have found the LSTM.lua file, but it seems like this file is only for building the units and stacking them (to make blocks) and to build layers out of those blocks. In other words, it seems like this file contains only code for building the model, not actually running training/inference/etcetera.

What I can’t seem to find is the code in which this model is actually used in training and/or predicting.

Please let me know if I’ve been incorrect in anything I’ve said, and of course, if you could provide any insight on how the code is working! Resources or tips for how I can wrap my head around the codebase in general would also be greatly appreciated! And, of course, if this is a duplicate, feel free to redirect me!

(Guillaume Klein) #2


There are several layers of abstraction in the code. It may make the code look more complex but it is necessary as we want to support many features and model configurations. To give a complete answer, there are 4 levels of modules:

  • Torch modules
  • Network parts
  • Networks
  • Models

Torch modules

These modules are low level building blocks that define how to construct, use and serialize themselves. Their functional scope is very narrow, for example computing the softmax of a tensor. Torch provides a lot of them but we might need to add new ones, for example onmt/modules/JoinReplicateTable.lua.

Network parts

onmt/modules/LSTM.lua is a network part. It just defines a static computation graph that can be used as is or integrated in a larger network.


Networks are higher level modules that define a full computation graph using Torch modules, network parts or even other networks. They also describe how to forward and backward a batch through this graph. For example, onmt/modules/Encoder.lua is a network that uses a word embedding layer and a stack of RNNs. onmt/modules/BiEncoder.lua is another network that uses a combination of 2 Encoders.


Models gather one or several networks to define how to train a particular task, for example sequence to sequence in onmt/Seq2Seq.lua.

Then, models are used in onmt/train/Trainer.lua for training, or in onmt/translate/Translator.lua for inference (in this case, for translation).

(May) #3

@guillaumekln Wow, thank you for such a comprehensive reply. I really appreciate it! Indeed, the added complexity from abstraction does make the code look scarier, but I agree that it is completely necessary to keep OpenNMT extensible and maintainable. Thanks for illuminating the overarching structure and design of the system. I feel a lot more comfortable with the codebase, but have some lingering doubts.

Although I already feel bad for taking up your time with my question, I was wondering if you could answer my follow up questions to your excellent response:

  1. Network parts, networks and models are all modules themselves, but at different scales?
  2. Torch modules build network parts, which build networks which then build models. Is this kind of hierarchy something from Torch, or something OpenNMT introduces as part of its design?
  3. LSTM.lua is a network part, meaning what it gives is a graph (my guess is this is why nngraph is required) made up of torch modules. ‘Traversing’ this graph causes the necessary computations for an LSTM, which is why it is called a ‘static computation graph’.
  4. Is it accurate to think of torch modules as sometimes being like neurons, where they are simple in scope, such as just evaluating a function over a tensor? I looked up torch modules in the torch documentation, and it seems like it is conceptualized as an input-action-output pipeline that looks something like function(input) -> output.
  5. LSTM is a static computation graph, which is then used to create the Encoder network (which is why I see these mentions of LSTM in the Encoder.lua file!). And then, we combine things like Encoders (and Decoders as well?) in models like Seq2Seq.lua. Finally, in the Trainer (for training) and/or in the Translator (for translating), we use our models such as Seq2Seq.

Does this seem right to you?

Thank you so much for your help. I am beyond appreciative.

(Guillaume Klein) #4
  1. Yes, you can look at it this way.
  2. Torch does not impose any kind of design. Users of the library can do whatever they want that is why it is sometimes hard to stay consistent in your approach.
  3. Correct.
  4. A neuron has a specific definition and you actually need several modules to build one (*). However, you are right to say that Torch modules have an interface like function(input) -> output.
  5. This is also correct.

Design choices are also impacted by Torch limitations. For example, it has no built-in support for recurrence that is why we first build a full graph that we replicate manually and optimize for memory space.

Modern libraries like PyTorch or TensorFlow make things easier now and there are less details to take care of when designing a system.

(*) Here is a single artificial neuron with an input dimension of 20 and a tanh activation layer:

local neuron = nn.Sequential()
  :add(nn.Linear(20, 1))

(May) #5

@guillaumekln Thanks for the response. I feel a lot more confident now about my understanding thanks to it.

I see now that torch modules are of an even finer granularity than a neuron. Are things like nn.Sequential() and nn.Linear(20, 1) and nn.Tanh() the modules in that example code?

It seems like I’ll have to learn a bit of torch (and the nn package) before the ONMT codebase reveals its finer details to me! However, this gives me some direction to take my studies, so thanks! Once I do this, I’ll revisit ONMT, since from what I can see, there are lots of references to the nn package.

(Guillaume Klein) #6

Yes. We usually refer to them as "nn modules".