what does this gradOutput means, why doing this step?

gradOutput is self explanatory: it is the gradients of the model outputs. It is the backpropagation step.

In the memory EfficientLoss, there is a loss_t.backward(), any suggestion about this two backward()? As i usually see about just one backward function to bacward the loss

The forward-backward passes of the generator are done separately for memory reasons. So the workflow is as follow:

  1. Forward whole sequence into encoder and decoder
  2. Forward and backward each generator timestep
  3. Backward whole sequence into decoder and encoder

Thanks, your interpretation is great. It’s helpful

Hi, I have combined the two backward() together. As you said, the memory Error comes. It’s intuitive that this way costs more memory, but can you give me some quantitative example or calculation, because I don’t know how to measure my model and parameters whether they are too big or not. That will be very appreciated.

Hey, dude it’s very appreciated that you can explain how the code guarantee that the backward can be split into two part or even more. Thank you very much.