Comparing OpenNMT and PyOpenNMT efficiency

I ran 3 different tests to compare OpenNMT and PyOpenNMT efficiency (and indirectly LuaTorch and PyTorch). Here is the configuration I used:

  • Dataset: 200K FREN corpus (tokenized with tokenize.lua)
  • Framework: latest version of Torch and PyTorch (3330287d compiled from sources with CUDA 8.0.44 and CuDNN 5.1.5)
  • Version: OpenNMT@935b69cf, PyOpenNMT@328f059e
  • Hardware: server with GTX 1080 (driver 375.26)

The results are reported considering the first epoch only.

Default parameters

All parameters are set to their default values.

Implementation GPU memory Average speed
PyOpenNMT 4057MB 4467.5 source tokens/s
PyOpenNMT (-max_generator_batches 1) 2065MB 3439.2 source tokens/s
OpenNMT 2021MB 4235.2 source tokens/s
OpenNMT (-cudnn) 2187MB 4375.6 source tokens/s

Larger model

4 layers, 800 rnn_size and rest to default values.

Implementation GPU memory Average speed
PyOpenNMT 5631MB 2382.7 source tokens/s
PyOpenNMT (-max_generator_batches 1) 5046MB 2029.7 source tokens/s
OpenNMT 3629MB 2372.8 source tokens/s
OpenNMT (-cudnn) 4899MB 2479.5 source tokens/s

Longer sequences

100 maximum sequence length, 3 layers, 600 rnn_size and rest to default values.

Implementation GPU memory Average speed
PyOpenNMT 4883MB 3442.0 source tokens/s
PyOpenNMT (-max_generator_batches 1) 7405MB 2673.9 source tokens/s
OpenNMT 7327MB 3236.7 source tokens/s
OpenNMT (-cudnn) 5359MB 3402.4 source tokens/s

The PyTorch implementation proved to be more memory efficient for large sequence length (also confirmed by @pltrdy’s experiments). The combination of CuDNN and the batched generator also seem to provide better performance.

What is your experience regarding efficiency of OpenNMT and PyOpenNMT? If you got different results for one of the test above, please share them!

4 Likes