Those considering the practicality of CPU training might be interested to know that training of the model based on the data in data/demo-train.t7 took just over 8 hours on a machine with the following spec:
Intel Xeon X3470, S1156, 2.93 GHz Quad Core with 32GB RAM.
Terence
Yeah… I would say CPU training is basically unreasonable for large data sets.
If CPU training is desirable, we could experiment with alternatives to exact softmax such as HSM or NCE.