I have some questions regarding the standard hyperparameters for training Transformer in OpenNMT-py and the pre-trained model from the website.
I know the FAQ lists the hyperparameters (http://opennmt.net/OpenNMT-py/FAQ.html). However, there is an inconsistency between the yellow block of hyperparameters and the “Here are what each of the parameters mean:” part just below it.
The “-accum_count” differs. Should it be 2, or 4, given the setup in the yellow block? If I understand it correctly, this greatly affects the number of epochs that fit into the 200,000 train steps.
The pre-trained Transformer model for WMT EN->DE (http://opennmt.net/Models-py/) is said to be trained using those settings. I was able to compute the same BLEU scores as listed on the website using this model, but when training a new model using the same downloadable preprocessed data and training settings, my scores never got as high up! I am trying to figure out why.
- The accum count from Q1 may have affected the number of epochs: for how many epochs was the pre-trained model trained?
- When downloading the model its name is “averaged-10-epoch.pt”. Was the model averaged over the last 80,000 steps? (because “-save_checkpoint_steps” is 10000 according to the website)
It would be amazing if someone could help!