Hi,

I meet a problem when training with option -fp16,

My model is trained with Tesla P100, so the train speed will be 2x faster if using fp16, but I cannot get this result,

if I train my model **without** option fp16:

[01/08/18 19:31:22 INFO] Epoch 1 ; Iteration 450/26079 ; Optim SGD LR 1.000000 ; Source tokens/s 3227 ; Perplexity 26296.91

[01/08/18 19:31:48 INFO] Epoch 1 ; Iteration 500/26079 ; Optim SGD LR 1.000000 ; Source tokens/s 3032 ; Perplexity 43565.00

[01/08/18 19:32:10 INFO] Epoch 1 ; Iteration 550/26079 ; Optim SGD LR 1.000000 ; Source tokens/s 3251 ; Perplexity 30608.28

[01/08/18 19:32:34 INFO] Epoch 1 ; Iteration 600/26079 ; Optim SGD LR 1.000000 ; Source tokens/s 3046 ; Perplexity 27288.81

[01/08/18 19:32:58 INFO] Epoch 1 ; Iteration 650/26079 ; Optim SGD LR 1.000000 ; Source tokens/s 3226 ; Perplexity 9511.65

[01/08/18 19:33:22 INFO] Epoch 1 ; Iteration 700/26079 ; Optim SGD LR 1.000000 ; Source tokens/s 3067 ; Perplexity 3838.16

[01/08/18 19:33:46 INFO] Epoch 1 ; Iteration 750/26079 ; Optim SGD LR 1.000000 ; Source tokens/s 3029 ; Perplexity 1993.95

**with** option fp16:

[01/11/18 16:17:24 INFO] Epoch 1 ; Iteration 450/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1767 ; Perplexity nan

[01/11/18 16:18:10 INFO] Epoch 1 ; Iteration 500/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1769 ; Perplexity nan

[01/11/18 16:18:57 INFO] Epoch 1 ; Iteration 550/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1745 ; Perplexity nan

[01/11/18 16:19:44 INFO] Epoch 1 ; Iteration 600/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1751 ; Perplexity nan

[01/11/18 16:20:32 INFO] Epoch 1 ; Iteration 650/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1774 ; Perplexity nan

[01/11/18 16:21:23 INFO] Epoch 1 ; Iteration 700/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1752 ; Perplexity nan

[01/11/18 16:22:12 INFO] Epoch 1 ; Iteration 750/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1777 ; Perplexity nan

[01/11/18 16:22:59 INFO] Epoch 1 ; Iteration 800/26080 ; Optim SGD LR 1.000000 ; Source tokens/s 1794 ; Perplexity nan

With fp16, 50 iterations need 47s, but without fp16, 50 iterations just take 24s

Also with fp16, perplexity will be nan, it’s strange

Anyone has tested fp16, did I make something wrong?

Thanks:)