Poor BLEU score with WMT Chinese-English translation

Hi,

I’ve been training models on zh-en translation. I tried dataset with the UN and news v14 and removed sentences longer then 50. BPE is also applied on the English dataset. I segmented Chinese sentences with jieba and trained with the following parameter settings.

encoder_type: brnn
decoder_type: rnn
word_vec_size: 512
rnn_size: 512
enc_layers: 6
dec_layers: 3

optim: adam
learning_rate: 0.0002

label_smoothing: 0.1
batch_size: 4096
batch_type: tokens
dropout: 0.1
global_attention: dot

However, the perplexity stops decreasing after 75000 steps and the BLEU score on test data is only 11.57. Here is the training log.

[2019-04-02 18:43:39,593 INFO] Step 74000/500000; acc:  67.26; ppl:  4.36; xent: 1.47; lr: 0.00003; 13813/16283 tok/s;  18576 sec
[2019-04-02 18:47:42,485 INFO] Step 75000/500000; acc:  67.77; ppl:  4.24; xent: 1.44; lr: 0.00003; 13767/16230 tok/s;  18819 sec
[2019-04-02 18:47:42,502 INFO] Loading dataset from data/wmt19_zhen.atok.low.valid.0.pt, number of examples: 2002
[2019-04-02 18:48:00,031 INFO] Validation perplexity: 64.1911
[2019-04-02 18:48:00,031 INFO] Validation accuracy: 35.7932
[2019-04-02 18:48:00,032 INFO] Saving checkpoint models/BiRNN-UN/BiRNN_step_75000.pt
[2019-04-02 18:52:06,883 INFO] Step 76000/500000; acc:  67.70; ppl:  4.25; xent: 1.45; lr: 0.00003; 12701/14915 tok/s;  19083 sec
[2019-04-02 18:56:09,805 INFO] Step 77000/500000; acc:  68.10; ppl:  4.15; xent: 1.42; lr: 0.00003; 13870/16283 tok/s;  19326 sec
[2019-04-02 19:00:14,300 INFO] Step 78000/500000; acc:  68.05; ppl:  4.19; xent: 1.43; lr: 0.00003; 13568/16095 tok/s;  19571 sec
[2019-04-02 19:03:26,265 INFO] Loading dataset from data/wmt19_zhen.atok.low.train.10.pt, number of examples: 264091
[2019-04-02 19:04:25,883 INFO] Step 79000/500000; acc:  68.39; ppl:  4.11; xent: 1.41; lr: 0.00003; 13266/15535 tok/s;  19822 sec
[2019-04-02 19:08:29,307 INFO] Step 80000/500000; acc:  69.27; ppl:  3.98; xent: 1.38; lr: 0.00001; 13900/16097 tok/s;  20066 sec
[2019-04-02 19:08:29,324 INFO] Loading dataset from data/wmt19_zhen.atok.low.valid.0.pt, number of examples: 2002
[2019-04-02 19:08:36,580 INFO] Validation perplexity: 63.9562
[2019-04-02 19:08:36,580 INFO] Validation accuracy: 36.0509
[2019-04-02 19:08:36,580 INFO] Saving checkpoint models/BiRNN-UN/BiRNN_step_80000.pt
[2019-04-02 19:10:49,863 INFO] Loading dataset from data/wmt19_zhen.atok.low.train.2.pt, number of examples: 989904
[2019-04-02 19:13:00,629 INFO] Step 81000/500000; acc:  68.89; ppl:  4.02; xent: 1.39; lr: 0.00001; 12409/14471 tok/s;  20337 sec
[2019-04-02 19:17:00,303 INFO] Step 82000/500000; acc:  67.65; ppl:  4.21; xent: 1.44; lr: 0.00001; 14095/16496 tok/s;  20577 sec
[2019-04-02 19:20:59,448 INFO] Step 83000/500000; acc:  67.95; ppl:  4.16; xent: 1.43; lr: 0.00001; 14117/16539 tok/s;  20816 sec
[2019-04-02 19:24:58,208 INFO] Step 84000/500000; acc:  68.02; ppl:  4.15; xent: 1.42; lr: 0.00001; 14174/16536 tok/s;  21055 sec
[2019-04-02 19:28:54,172 INFO] Step 85000/500000; acc:  68.09; ppl:  4.12; xent: 1.41; lr: 0.00001; 14279/16730 tok/s;  21291 sec
[2019-04-02 19:28:54,188 INFO] Loading dataset from data/wmt19_zhen.atok.low.valid.0.pt, number of examples: 2002

Could anybody tell me what is the issue with this? Thanks a lot!

Nick

UN dataset is not good enough and you need to train a tranformer.

read this paper.


You really need to read a lot of papers to better understand the state of the art.
Cheers.

Thank you so much. I’ll try to read some more papers and augment my dataset.

Best,
Nick