Why the gpu use ratio is low?

Hi, I use the demo training with the script:

python train.py -data data/multi30k.atok.low -save_model multi30k_model -gpu_ranks 0 -batch_size 4 -heads 8 -rnn_size 512 -train_steps 1000000 -layers 6 -save_checkpoint_steps 10000 -valid_batch_size 16

the process print like:
[2019-05-03 09:16:04,339 INFO] Step 9850/1000000; acc: 31.21; ppl: 55.07; xent: 4.01; lr: 1.00000; 258/267 tok/s; 1971 sec
[2019-05-03 09:16:14,589 INFO] Step 9900/1000000; acc: 30.24; ppl: 58.23; xent: 4.06; lr: 1.00000; 263/272 tok/s; 1981 sec
[2019-05-03 09:16:24,402 INFO] Step 9950/1000000; acc: 30.09; ppl: 65.20; xent: 4.18; lr: 1.00000; 263/276 tok/s; 1991 sec
[2019-05-03 09:16:33,927 INFO] Step 10000/1000000; acc: 32.21; ppl: 56.55; xent: 4.04; lr: 1.00000; 261/267 tok/s; 2001 sec

but the gpu use ratio is so low:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:04:00.0 Off | N/A |
| 39% 45C P2 46W / 175W | 1479MiB / 7951MiB | 21% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 20750 C python 1469MiB |
±----------------------------------------------------------------------------+

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
5.43 0.00 1.77 0.11 0.00 92.69

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 8.23 125.37 1586.67 4731417 59880824
dm-0 136.04 73.33 583.22 2767301 22010928

top:
top - 09:22:25 up 10:29, 4 users, load average: 0.92, 0.77, 0.71
Tasks: 212 total, 1 running, 211 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.8 us, 2.9 sy, 0.0 ni, 88.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65733292 total, 31241160 free, 3375296 used, 31116836 buff/cache
KiB Swap: 8382460 total, 8382460 free, 0 used. 61716336 avail Mem

I don’t think the demo training script says to run with batch_size 4 and with heads 8 for a RNN.

Read the docs and FAQ, it will help.

Hi @vince62s ;

I use the default script like this:

python train.py -data data/multi30k.atok.low -save_model multi30k_model -gpu_ranks 0

but the use ratio is slowly too:
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:04:00.0 Off | N/A |
| 42% 49C P2 58W / 175W | 2765MiB / 7951MiB | 32% Default |

How can I speed up with 100% use ratio?

because it’s a very small network.