To solve the brnn training problem, I tried to using only one GPU for training, and everything works fine this time.
However, I found that the training speed is actually a little faster than using two GPU.
For confirming that, I trained a new model with all default parameters again (all the commands just like quick start, only change the data). Now I am pretty sure, when training a seq2seq model on my data, two GPU is slower than one GPU.
Could somebody tell me, is this normal? why did this happen? or, did I do something wrong ?