I've trained a English->Japanese engine, the BLEU score is 46.
I used 1.45 millions of sentences with 5 layers and batch_size 1000 during the training procedure.
After a pilot translation with the trained engine, it seems the result is unaccptable at all. Many unknow, mistranslations.... See the comparison result as reference:
Left: Human translation
Right: NMT translation
I also have some other trained engines with same parameters, e.g: English->German, English->Spanish..., the translation results are general good.
Not sure if there some special steps for processing Asian language (Chinese, Janpanese, Korean) of NMT.