Choosing number of epochs for a stacked encoder decoder model

JeffreyJosanne · February 12, 2018, 10:33pm

I’m training an encoder-decoder model on an English to Japanese dataset. The observed results depict that the baseline model (1 layer encoder, 1 layer decoder, 12 epochs) is better in terms of training accuracy than the model with 2 layered encoder and 3 layered decoder at epoch 12. Isn’t more layers supposed overfit and increase the training accuracy. Or is there an empirical rule for choosing epoch value for a specific set of encoder decoder layers?

guillaumekln · February 13, 2018, 9:05am

Hello,

How many training data did you use?

JeffreyJosanne · February 13, 2018, 9:28am

The training set’s of size 10000 sentences and validation is of 500 sentences.

guillaumekln · February 13, 2018, 11:58am

Bigger model means more parameters which require more iterations to be properly optimized. This implies more epoch or more data.

Note that 10,000 examples are not enough to train a NMT system (at least a decent one). People usually work with 1M+ sentences.

JeffreyJosanne · February 13, 2018, 12:14pm

Thanks for the reply. This is an academic assignment and that is the reason we are dealing with such a short corpus. Here, we are only studying the effects of multiple layers and their perplexity, BLEU on a training set.

Due to GPU constraints, I was looking for a range epoch values instead of tuning its value. Thanks again.