How long should the training with the default data take?

pytorch

(Martin Wunderlich) #1

Hello,

I have installed the pytorch version of OpenNMT and I am currently trying to run the training with the demo data set. However, after several hours of training, the accuracy still seems quite low and the training does not terminate.
Here are some details from the console output:

[2018-07-10 07:07:02,010 INFO] Loading train dataset from data/demo.train.pt, number of examples: 10000
[2018-07-10 07:10:33,008 INFO] Step 650, 100000; acc:  11.95; ppl: 1023.88; xent:   6.93; lr: 1.00000; 277 / 285 tok/s;   2461 sec
[2018-07-10 07:25:53,663 INFO] Step 700, 100000; acc:  12.59; ppl: 810.70; xent:   6.70; lr: 1.00000; 404 / 361 tok/s;   3381 sec
[2018-07-10 07:45:28,219 INFO] Step 750, 100000; acc:  13.16; ppl: 720.82; xent:   6.58; lr: 1.00000; 252 / 278 tok/s;   4556 sec
[2018-07-10 08:00:11,008 INFO] Loading train dataset from data/demo.train.pt, number of examples: 10000
[2018-07-10 08:04:31,541 INFO] Step 800, 100000; acc:  14.44; ppl: 640.37; xent:   6.46; lr: 1.00000;  26 /  28 tok/s;   5699 sec
[2018-07-10 08:20:39,066 INFO] Step 850, 100000; acc:  12.50; ppl: 810.22; xent:   6.70; lr: 1.00000; 584 / 524 tok/s;   6667 sec
[2018-07-10 08:40:13,886 INFO] Step 900, 100000; acc:  18.80; ppl: 698.56; xent:   6.55; lr: 1.00000; 258 / 249 tok/s;   7842 sec

The code is running on an Amazon AWS VIRTUAL machine, based on one of the preconfigured Deep Learning AMIs. The specs are: instance ID “t2.xlarge”, 4 CPUs, 16 GB RAM

So, I have two questions:

  • What is the termination criterion for the default training?
  • How long would it normally take to get there?
    Thanks a lot.

Kind regards,

Martin