Translate.py runs faster on CPU than GPU

I found that the translate.py runs faster on CPU than GPU. (OpenNMT-py ver 0.2)
On GPU, It needs 20 seconds for one batch which on CPU is just 17.3 seconds.
(-batch_size 32 -share_vocab -max_length 50 -block_ngram_repeat 5 -beam_size 5)

So, I decided to run model training and run translate at the same time.
And they are working well.
The train works as usual and the translate works faster than usual.
And I don’t need to worry about the OutOfGpuMemory.
I will test it on this weekend.