Hi,
I’ve been training models on zh-en translation. I tried dataset with the UN and news v14 and removed sentences longer then 50. BPE is also applied on the English dataset. I segmented Chinese sentences with jieba and trained with the following parameter settings.
encoder_type: brnn
decoder_type: rnn
word_vec_size: 512
rnn_size: 512
enc_layers: 6
dec_layers: 3
optim: adam
learning_rate: 0.0002
label_smoothing: 0.1
batch_size: 4096
batch_type: tokens
dropout: 0.1
global_attention: dot
However, the perplexity stops decreasing after 75000 steps and the BLEU score on test data is only 11.57. Here is the training log.
[2019-04-02 18:43:39,593 INFO] Step 74000/500000; acc: 67.26; ppl: 4.36; xent: 1.47; lr: 0.00003; 13813/16283 tok/s; 18576 sec
[2019-04-02 18:47:42,485 INFO] Step 75000/500000; acc: 67.77; ppl: 4.24; xent: 1.44; lr: 0.00003; 13767/16230 tok/s; 18819 sec
[2019-04-02 18:47:42,502 INFO] Loading dataset from data/wmt19_zhen.atok.low.valid.0.pt, number of examples: 2002
[2019-04-02 18:48:00,031 INFO] Validation perplexity: 64.1911
[2019-04-02 18:48:00,031 INFO] Validation accuracy: 35.7932
[2019-04-02 18:48:00,032 INFO] Saving checkpoint models/BiRNN-UN/BiRNN_step_75000.pt
[2019-04-02 18:52:06,883 INFO] Step 76000/500000; acc: 67.70; ppl: 4.25; xent: 1.45; lr: 0.00003; 12701/14915 tok/s; 19083 sec
[2019-04-02 18:56:09,805 INFO] Step 77000/500000; acc: 68.10; ppl: 4.15; xent: 1.42; lr: 0.00003; 13870/16283 tok/s; 19326 sec
[2019-04-02 19:00:14,300 INFO] Step 78000/500000; acc: 68.05; ppl: 4.19; xent: 1.43; lr: 0.00003; 13568/16095 tok/s; 19571 sec
[2019-04-02 19:03:26,265 INFO] Loading dataset from data/wmt19_zhen.atok.low.train.10.pt, number of examples: 264091
[2019-04-02 19:04:25,883 INFO] Step 79000/500000; acc: 68.39; ppl: 4.11; xent: 1.41; lr: 0.00003; 13266/15535 tok/s; 19822 sec
[2019-04-02 19:08:29,307 INFO] Step 80000/500000; acc: 69.27; ppl: 3.98; xent: 1.38; lr: 0.00001; 13900/16097 tok/s; 20066 sec
[2019-04-02 19:08:29,324 INFO] Loading dataset from data/wmt19_zhen.atok.low.valid.0.pt, number of examples: 2002
[2019-04-02 19:08:36,580 INFO] Validation perplexity: 63.9562
[2019-04-02 19:08:36,580 INFO] Validation accuracy: 36.0509
[2019-04-02 19:08:36,580 INFO] Saving checkpoint models/BiRNN-UN/BiRNN_step_80000.pt
[2019-04-02 19:10:49,863 INFO] Loading dataset from data/wmt19_zhen.atok.low.train.2.pt, number of examples: 989904
[2019-04-02 19:13:00,629 INFO] Step 81000/500000; acc: 68.89; ppl: 4.02; xent: 1.39; lr: 0.00001; 12409/14471 tok/s; 20337 sec
[2019-04-02 19:17:00,303 INFO] Step 82000/500000; acc: 67.65; ppl: 4.21; xent: 1.44; lr: 0.00001; 14095/16496 tok/s; 20577 sec
[2019-04-02 19:20:59,448 INFO] Step 83000/500000; acc: 67.95; ppl: 4.16; xent: 1.43; lr: 0.00001; 14117/16539 tok/s; 20816 sec
[2019-04-02 19:24:58,208 INFO] Step 84000/500000; acc: 68.02; ppl: 4.15; xent: 1.42; lr: 0.00001; 14174/16536 tok/s; 21055 sec
[2019-04-02 19:28:54,172 INFO] Step 85000/500000; acc: 68.09; ppl: 4.12; xent: 1.41; lr: 0.00001; 14279/16730 tok/s; 21291 sec
[2019-04-02 19:28:54,188 INFO] Loading dataset from data/wmt19_zhen.atok.low.valid.0.pt, number of examples: 2002
Could anybody tell me what is the issue with this? Thanks a lot!
Nick