As I'm looking through the code for the Criterion and ParallelClassNLLCriterion classes, as well as the implementation you referenced, it appears that it's deferring the logic for the cost function to the decoder. Is that correct?
So, would I need to define a new criterion that would override this behavior and actually define the logic in the forward step? And if so, does Torch provide the autograd features that are available in PyTorch? Because if they are, then my understanding is that as long as I can represent my cost function purely with matrix math (which should be easy to do), it should leverage the autograd features so that I wouldn't also need to define the backward function. But does Torch provide the autograd features that are available in PyTorch?