Praticality of CPU training

Those considering the practicality of CPU training might be interested to know that training of the model based on the data in data/demo-train.t7 took just over 8 hours on a machine with the following spec:
Intel Xeon X3470, S1156, 2.93 GHz Quad Core with 32GB RAM.

Yeah… I would say CPU training is basically unreasonable for large data sets.

If CPU training is desirable, we could experiment with alternatives to exact softmax such as HSM or NCE.

5 posts were split to a new topic: Improving performance by replacing Softmax