I ran 3 different tests to compare OpenNMT and PyOpenNMT efficiency (and indirectly LuaTorch and PyTorch). Here is the configuration I used:
-
Dataset: 200K FREN corpus (tokenized with
tokenize.lua
) - Framework: latest version of Torch and PyTorch (3330287d compiled from sources with CUDA 8.0.44 and CuDNN 5.1.5)
- Version: OpenNMT@935b69cf, PyOpenNMT@328f059e
- Hardware: server with GTX 1080 (driver 375.26)
The results are reported considering the first epoch only.
Default parameters
All parameters are set to their default values.
Implementation | GPU memory | Average speed |
---|---|---|
PyOpenNMT | 4057MB | 4467.5 source tokens/s |
PyOpenNMT (-max_generator_batches 1) | 2065MB | 3439.2 source tokens/s |
OpenNMT | 2021MB | 4235.2 source tokens/s |
OpenNMT (-cudnn) | 2187MB | 4375.6 source tokens/s |
Larger model
4 layers, 800 rnn_size and rest to default values.
Implementation | GPU memory | Average speed |
---|---|---|
PyOpenNMT | 5631MB | 2382.7 source tokens/s |
PyOpenNMT (-max_generator_batches 1) | 5046MB | 2029.7 source tokens/s |
OpenNMT | 3629MB | 2372.8 source tokens/s |
OpenNMT (-cudnn) | 4899MB | 2479.5 source tokens/s |
Longer sequences
100 maximum sequence length, 3 layers, 600 rnn_size and rest to default values.
Implementation | GPU memory | Average speed |
---|---|---|
PyOpenNMT | 4883MB | 3442.0 source tokens/s |
PyOpenNMT (-max_generator_batches 1) | 7405MB | 2673.9 source tokens/s |
OpenNMT | 7327MB | 3236.7 source tokens/s |
OpenNMT (-cudnn) | 5359MB | 3402.4 source tokens/s |
The PyTorch implementation proved to be more memory efficient for large sequence length (also confirmed by @pltrdy’s experiments). The combination of CuDNN and the batched generator also seem to provide better performance.
What is your experience regarding efficiency of OpenNMT and PyOpenNMT? If you got different results for one of the test above, please share them!