Perplexity explosion


(YU Chang) #1

Hi~ I am training a Chinese to English model with 50 million sentence pair.

Everything work fine in epoch 1, however, the perplexity abruptly increased in epoch 2:

[12/05/17 19:43:42 INFO] Epoch 2 ; Iteration 395700/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5624 ; Perplexity 3.39
[12/05/17 19:43:56 INFO] Epoch 2 ; Iteration 395750/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 6039 ; Perplexity 3.45
[12/05/17 19:44:08 INFO] Epoch 2 ; Iteration 395800/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5266 ; Perplexity 3.40
[12/05/17 19:44:22 INFO] Epoch 2 ; Iteration 395850/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5325 ; Perplexity 3.32
[12/05/17 19:44:36 INFO] Epoch 2 ; Iteration 395900/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5888 ; Perplexity 3.50
[12/05/17 19:44:50 INFO] Epoch 2 ; Iteration 395950/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5622 ; Perplexity 3.42
[12/05/17 19:45:03 INFO] Epoch 2 ; Iteration 396000/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5410 ; Perplexity 3.41
[12/05/17 19:45:17 INFO] Epoch 2 ; Iteration 396050/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5793 ; Perplexity 3.42
[12/05/17 19:45:31 INFO] Epoch 2 ; Iteration 396100/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5455 ; Perplexity 3.36
[12/05/17 19:45:44 INFO] Epoch 2 ; Iteration 396150/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5155 ; Perplexity 883.71
[12/05/17 19:45:56 INFO] Epoch 2 ; Iteration 396200/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5574 ; Perplexity 1874534.60
[12/05/17 19:46:11 INFO] Epoch 2 ; Iteration 396250/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5596 ; Perplexity 6408383.68
[12/05/17 19:46:25 INFO] Epoch 2 ; Iteration 396300/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5507 ; Perplexity 7882103.67
[12/05/17 19:46:38 INFO] Epoch 2 ; Iteration 396350/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5767 ; Perplexity 11097226.36
[12/05/17 19:46:52 INFO] Epoch 2 ; Iteration 396400/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5285 ; Perplexity 9547477.97
[12/05/17 19:47:06 INFO] Epoch 2 ; Iteration 396450/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5344 ; Perplexity 22019509.81
[12/05/17 19:47:20 INFO] Epoch 2 ; Iteration 396500/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5970 ; Perplexity 47101783.20
[12/05/17 19:47:34 INFO] Epoch 2 ; Iteration 396550/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5377 ; Perplexity 11989194.37
[12/05/17 19:47:49 INFO] Epoch 2 ; Iteration 396600/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5766 ; Perplexity 26234256.16
[12/05/17 19:48:03 INFO] Epoch 2 ; Iteration 396650/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5976 ; Perplexity 3002838.56
[12/05/17 19:48:18 INFO] Epoch 2 ; Iteration 396700/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5698 ; Perplexity 1396222.70
[12/05/17 19:48:32 INFO] Epoch 2 ; Iteration 396750/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5480 ; Perplexity 146913.45
[12/05/17 19:48:46 INFO] Epoch 2 ; Iteration 396800/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5803 ; Perplexity 168172.61
[12/05/17 19:48:57 INFO] Epoch 2 ; Iteration 396850/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5167 ; Perplexity 85283.03
[12/05/17 19:49:11 INFO] Epoch 2 ; Iteration 396900/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5613 ; Perplexity 95478.11
[12/05/17 19:49:25 INFO] Epoch 2 ; Iteration 396950/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5679 ; Perplexity 23972.18
[12/05/17 19:49:39 INFO] Epoch 2 ; Iteration 397000/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5642 ; Perplexity 455034.70
[12/05/17 19:49:53 INFO] Epoch 2 ; Iteration 397050/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5617 ; Perplexity 11126377.97
[12/05/17 19:50:06 INFO] Epoch 2 ; Iteration 397100/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5514 ; Perplexity 27062372.85
[12/05/17 19:50:19 INFO] Epoch 2 ; Iteration 397150/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5228 ; Perplexity 41885601.02
[12/05/17 19:50:32 INFO] Epoch 2 ; Iteration 397200/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5438 ; Perplexity 94102695.19
[12/05/17 19:50:45 INFO] Epoch 2 ; Iteration 397250/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5720 ; Perplexity 423779581.12
[12/05/17 19:51:00 INFO] Epoch 2 ; Iteration 397300/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5481 ; Perplexity 354617139.24
[12/05/17 19:51:14 INFO] Epoch 2 ; Iteration 397350/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5404 ; Perplexity 354608309.08
[12/05/17 19:51:27 INFO] Epoch 2 ; Iteration 397400/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5485 ; Perplexity 42142321.61
[12/05/17 19:51:41 INFO] Epoch 2 ; Iteration 397450/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5674 ; Perplexity 123105959.04
[12/05/17 19:51:55 INFO] Epoch 2 ; Iteration 397500/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5705 ; Perplexity 525247303.93
[12/05/17 19:52:08 INFO] Epoch 2 ; Iteration 397550/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5488 ; Perplexity 305671651.55
[12/05/17 19:52:22 INFO] Epoch 2 ; Iteration 397600/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5524 ; Perplexity 845252711.23
[12/05/17 19:52:37 INFO] Epoch 2 ; Iteration 397650/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 5504 ; Perplexity 2713147551.43

This is the command I use to train / retrain:

th train.lua -data /data/openNMT/cn_en/preprocess/full_cn_en-train.t7 -save_model /data/openNMT/cn_en/models/full_cn_en-model -encoder_type brnn -gpuid 2 -train_from /data/openNMT/cn_en/models/full_cn_en-model_epoch1_3.10.t7 -continue -log_file /data/openNMT/cn_en/logs/train_cn_en_full_continue_from_epoch1.log 

I trained a English to Chinese model at the same time, and the perplexity also explode in epoch 2.

I never encounter this problem before, does any body know why this happen, and how to solve it? Should I set the -max_grad_norm to a smaller / bigger value?

Any advise or guidance would be greatly appreciated.


(Guillaume Klein) #2

Hello,

If you restart the second epoch, do you consistently get the same behavior?


(YU Chang) #3

Thank you for your reply.

The logging I posted is the output of retraining which continue from epoch 1.

So, yes, I got the same problem when restart the second epoch.

Now I am training the same data with this command:

th train.lua -data /data/openNMT/cn_en/preprocess/full_cn_en-train.t7 -save_model /data/openNMT/cn_en/models/full_cn_en-model -encoder_type brnn -layers 4 -residual true -gpuid 2 -log_file /data/openNMT/cn_en/logs/train_cn_en_full.log

Hope this time the perplexity would not explosion again.


(YU Chang) #4

The perplexity explode again, and this time it was turn to nan after a while.

[12/12/17 12:38:10 INFO] Epoch 2 ; Iteration 547700/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3751 ; Perplexity 4.60
[12/12/17 12:38:29 INFO] Epoch 2 ; Iteration 547750/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3985 ; Perplexity 5.03
[12/12/17 12:38:50 INFO] Epoch 2 ; Iteration 547800/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4219 ; Perplexity 5.07
[12/12/17 12:39:10 INFO] Epoch 2 ; Iteration 547850/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3939 ; Perplexity 4.75
[12/12/17 12:39:28 INFO] Epoch 2 ; Iteration 547900/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3965 ; Perplexity 4.65
[12/12/17 12:39:48 INFO] Epoch 2 ; Iteration 547950/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3989 ; Perplexity 4.92
[12/12/17 12:40:05 INFO] Epoch 2 ; Iteration 548000/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3912 ; Perplexity 4.56
[12/12/17 12:40:23 INFO] Epoch 2 ; Iteration 548050/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3909 ; Perplexity 4.71
[12/12/17 12:40:43 INFO] Epoch 2 ; Iteration 548100/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3985 ; Perplexity 8.55
[12/12/17 12:41:01 INFO] Epoch 2 ; Iteration 548150/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3921 ; Perplexity 18.50
[12/12/17 12:41:20 INFO] Epoch 2 ; Iteration 548200/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3860 ; Perplexity 72.73
[12/12/17 12:41:38 INFO] Epoch 2 ; Iteration 548250/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4156 ; Perplexity 633.57
[12/12/17 12:41:58 INFO] Epoch 2 ; Iteration 548300/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4004 ; Perplexity 11996.44
[12/12/17 12:42:16 INFO] Epoch 2 ; Iteration 548350/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4047 ; Perplexity 23356.12
[12/12/17 12:42:35 INFO] Epoch 2 ; Iteration 548400/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4003 ; Perplexity 15903.40
[12/12/17 12:42:54 INFO] Epoch 2 ; Iteration 548450/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3968 ; Perplexity 18333.47
[12/12/17 12:43:12 INFO] Epoch 2 ; Iteration 548500/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4007 ; Perplexity 14262.54
[12/12/17 12:43:31 INFO] Epoch 2 ; Iteration 548550/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3943 ; Perplexity 29962.23
[12/12/17 12:43:49 INFO] Epoch 2 ; Iteration 548600/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3976 ; Perplexity 55310.36
[12/12/17 12:44:08 INFO] Epoch 2 ; Iteration 548650/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3946 ; Perplexity 80715.05
[12/12/17 12:44:28 INFO] Epoch 2 ; Iteration 548700/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4040 ; Perplexity 207936.22
[12/12/17 12:44:45 INFO] Epoch 2 ; Iteration 548750/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3871 ; Perplexity 315597.60
[12/12/17 12:45:06 INFO] Epoch 2 ; Iteration 548800/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4043 ; Perplexity 445907.52
[12/12/17 12:45:24 INFO] Epoch 2 ; Iteration 548850/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4119 ; Perplexity 275662.15
[12/12/17 12:45:42 INFO] Epoch 2 ; Iteration 548900/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3916 ; Perplexity 695460.16
[12/12/17 12:46:02 INFO] Epoch 2 ; Iteration 548950/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4113 ; Perplexity 959504.41
[12/12/17 12:46:21 INFO] Epoch 2 ; Iteration 549000/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4032 ; Perplexity 474672.42
[12/12/17 12:46:40 INFO] Epoch 2 ; Iteration 549050/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3870 ; Perplexity 735283.30
[12/12/17 12:46:58 INFO] Epoch 2 ; Iteration 549100/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4252 ; Perplexity 1170335.11
..............
[12/12/17 13:29:38 INFO] Epoch 2 ; Iteration 555850/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4090 ; Perplexity 226715875.49
[12/12/17 13:29:56 INFO] Epoch 2 ; Iteration 555900/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4062 ; Perplexity 241014213.36
[12/12/17 13:30:15 INFO] Epoch 2 ; Iteration 555950/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4088 ; Perplexity 218984057.24
[12/12/17 13:30:36 INFO] Epoch 2 ; Iteration 556000/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4099 ; Perplexity 202842780.47
[12/12/17 13:30:54 INFO] Epoch 2 ; Iteration 556050/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3767 ; Perplexity 176270456.18
[12/12/17 13:31:15 INFO] Epoch 2 ; Iteration 556100/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4163 ; Perplexity 209369281.50
[12/12/17 13:31:34 INFO] Epoch 2 ; Iteration 556150/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3967 ; Perplexity 188804608.44
[12/12/17 13:31:52 INFO] Epoch 2 ; Iteration 556200/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3923 ; Perplexity 190390276.20
[12/12/17 13:32:12 INFO] Epoch 2 ; Iteration 556250/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4115 ; Perplexity nan
[12/12/17 13:32:31 INFO] Epoch 2 ; Iteration 556300/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3997 ; Perplexity nan
[12/12/17 13:32:49 INFO] Epoch 2 ; Iteration 556350/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4044 ; Perplexity nan
[12/12/17 13:33:08 INFO] Epoch 2 ; Iteration 556400/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3795 ; Perplexity nan
[12/12/17 13:33:27 INFO] Epoch 2 ; Iteration 556450/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4282 ; Perplexity nan
[12/12/17 13:33:45 INFO] Epoch 2 ; Iteration 556500/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 3914 ; Perplexity nan
[12/12/17 13:34:03 INFO] Epoch 2 ; Iteration 556550/791100 ; Optim SGD LR 1.000000 ; Source tokens/s 4120 ; Perplexity nan

I searched other posting in forum, and retrain with a reducing learning rate:

th train.lua -data /data/openNMT/cn_en/preprocess/full_cn_en-train.t7 -save_model /data/openNMT/cn_en/models/full_cn_en-model -encoder_type brnn -layers 4 -residual true -gpuid 2 -train_from /data/openNMT/cn_en/models/full_cn_en-model_epoch1_3.64.t7 -start_epoch 2 -learning_rate 0.5 -log_file /data/openNMT/cn_en/logs/continue_epoch1_train_cn_en_full.log

I will report the retraining result later.


(Eva) #5

Hi @huache!

I had a similar problem some time ago. In my case it had to do with the learning rate and the training algorithm. Using sgd with learning rate 1 produced me a gradient explosion during training that was reflected in the perplexity explosion.

If the change in the learning rate doesn’t work, I would suggest you to try to use adam or adagrad algorithms for your training. This adaptative algorithms are able to adapt the learning rate at each training step and control this kind of gradient explosion problems.
Another option is to use a smaller learning rate using sgd until you find the one that works for you.

good luck!
Eva


(YU Chang) #6

Thank you so much for your enlightening advice ! :tulip::tulip::tulip: I’ll follow it.