I am trying to train model with 50k data.
This is the training command -
python train.py -data data/demo -save_model demo-model -batch_size 256 -train_steps 20000 -report_every 500 -save_checkpoint_steps 5000 -optim adam -learning_rate 0.001 -learning_rate_decay 0.0002 -start_decay_steps 10000 -decay_steps 2000 -world_size 1 -gpu_ranks 0
Surpricingly after train steps 9500, lr value become 0. Any idea why this happened?
[2018-11-19 14:16:17,246 INFO] Step 9000/20000; acc: 94.06; ppl: 1.22; xent: 0.20; lr: 0.00100; 4383/4090 tok/s; 11191 sec
[2018-11-19 14:16:37,362 INFO] Loading train dataset from data/demo.train.0.pt, number of examples: 47012
[2018-11-19 14:20:25,384 INFO] Loading train dataset from data/demo.train.0.pt, number of examples: 47012
[2018-11-19 14:24:13,719 INFO] Loading train dataset from data/demo.train.0.pt, number of examples: 47012
[2018-11-19 14:26:34,727 INFO] Step 9500/20000; acc: 91.82; ppl: 1.38; xent: 0.32; lr: 0.00100; 5093/4553 tok/s; 11809 sec
[2018-11-19 14:28:01,937 INFO] Loading train dataset from data/demo.train.0.pt, number of examples: 47012
[2018-11-19 14:31:50,261 INFO] Loading train dataset from data/demo.train.0.pt, number of examples: 47012
[2018-11-19 14:35:38,673 INFO] Loading train dataset from data/demo.train.0.pt, number of examples: 47012
[2018-11-19 14:36:53,552 INFO] Step 10000/20000; acc: 95.38; ppl: 1.18; xent: 0.16; lr: 0.00000; 5507/5266 tok/s; 12428 sec
[2018-11-19 14:36:53,723 INFO] Loading valid dataset from data/demo.valid.0.pt, number of examples: 9411
[2018-11-19 14:37:29,796 INFO] Validation perplexity: 145.798
[2018-11-19 14:37:29,796 INFO] Validation accuracy: 51.0478
[2018-11-19 14:37:29,797 INFO] Saving checkpoint demo-model_step_10000.pt
Also I believe “acc: 95.38” this refers to training accuracy and Validation accuracy: “51.0478”
Why such a big difference between both?