Those considering the practicality of CPU training might be interested to know that training of the model based on the data in data/demo-train.t7 took just over 8 hours on a machine with the following spec:Intel Xeon X3470, S1156, 2.93 GHz Quad Core with 32GB RAM.Terence
Yeah... I would say CPU training is basically unreasonable for large data sets.
If CPU training is desirable, we could experiment with alternatives to exact softmax such as HSM or NCE.
5 posts were split to a new topic: Improving performance by replacing Softmax