Performance analysis number

Using -profiler option - we get the following performance number on v0.3 for GPU (Titan X Pascal) - (one epoch on 200K sentence of baseline-1M-enfr corpus):

  • network 2x500x500 - vocabulary 50K
train:{total:1220.48,
       encoder:{total:128.918,bwd:76.1674,fwd:52.6742},
       decoder:{total:1018.24,
                fwd:224.269,
                bwd:{total:793.903,
                     generator:{total:353.259,bwd:186.989,fwd:163.155},
                     criterion:{total:30.4795,fwd:6.96226,bwd:20.9239}}}},
valid:6.06029
  • network 4x1000x500 - vocabulary 50K
train:{total:2346.22,
       encoder:{total:404.836,fwd:153.731,bwd:251.019},
       decoder:{total:1842.45,
                bwd:{total:1428.69,
                criterion:{total:33.7131,bwd:22.9849,fwd:7.55272},
                generator:{total:657.478,bwd:372.867,fwd:281.834}},fwd:413.687}},
valid:13.0751
  • network 4x1000x500 - vocabulary 100K
train:{total:3034.75,
       encoder(*):{total:407.396,fwd:156.66,bwd:250.653},
       decoder:{total:2499.1,
                fwd:411.751,
                bwd:{total:2087.28,
                     generator:{total:1308.53,fwd:563.358,bwd:742.988},
                     criterion:{total:50.3954,fwd:8.31949,bwd:39.0878}}}},
valid:16.8854

(*) with -cudnn RNN option (using cudnn LSTM implementation):

       encoder:{total:196.527,bwd:127.304,fwd:69.1534}}

Here is a comparison between CPU and GPU on the demo dataset with default settings (2x500, WE 500, vocabulary 36K):

CPU

train:{total:934.303,
       decoder:{total:788.314,
                bwd:{total:653.042,
                     criterion:{total:16.4723,bwd:15.9572,fwd:0.350564},
                     generator:{total:462.385,bwd:305.003,fwd:157.062}},
                fwd:135.265},
       encoder:{total:119.242,bwd:53.3775,fwd:65.8583}},
valid:121.707

GPU

train:{total:47.5369,
       decoder:{total:38.2321,
                bwd:{total:28.4924,
                     criterion:{total:1.48713,bwd:0.98552,fwd:0.364133},
                     generator:{total:16.4715,bwd:8.84765,fwd:7.51978}},
                fwd:9.73594},
       encoder:{total:8.15399,bwd:3.12702,fwd:5.02335}},
valid:4.70236