If I wanted to use a custom cost function (like perplexity but a more sophisticated model), where would I put that in the code to extend the product so that I can optimize for that cost function instead of optimizing for perplexity or the BLEU score?
Hi @devinbostIL - @guillaumekln has made a very nice template for Evaluators. See this comment for details. Or check out my PR incorporating DL ratio.
Evaluators are only used to compute a validation score, not the cost function as used during the training.
To change the cost function, you need to provide a Criterion
. See for example:
It only works at the timestep level and not the sequence level as of now.
As I’m looking through the code for the Criterion and ParallelClassNLLCriterion classes, as well as the implementation you referenced, it appears that it’s deferring the logic for the cost function to the decoder. Is that correct?
So, would I need to define a new criterion that would override this behavior and actually define the logic in the forward step? And if so, does Torch provide the autograd features that are available in PyTorch? Because if they are, then my understanding is that as long as I can represent my cost function purely with matrix math (which should be easy to do), it should leverage the autograd features so that I wouldn’t also need to define the backward function. But does Torch provide the autograd features that are available in PyTorch?
yes - for the moment, the criterion is called in the backward pass of the decoder.
you should keep the same logic.
Torch does not provide the autograd so you need to implement updateOutput
and updateGradInput
in the class - it is a little overhead, but in most case very straight.