Dumb Q about training steps in NMT in general.
Q1. Are these essentially steps of optimization of the penalty = sum of all residuals for all word vector/NN parameter error terms for all sentences?
Q2. So intermediate saves of the model always use the entire sentence data set? Just not as optimized as a later save.
Q1. A training step is one update of the model parameters.
Q2. There is no guarantee that the training saw the complete data set before a save. If it saves at step N, it just means that the model parameters have been update N times. The exact number of sentences that were seen depends on the batch size.
1 Like