I’ve been training on OpenSubtitles2018 en/es parallel corpora (~45M sentences)
But it’s been more than 18 hours and the epoch 1 hasn’t finished. I’m using GTX1070 graphics card.
From the output, I’ve noticed that the bach number has already exceeded n batches by 10 times (702100/70037). Is this the problem? Can someone explain what is going on here and how do I fix it?
This is the command I’ve used to train:
python3 ../../../../onmtpy/train.py -data opensubs.atok.low -save_model opensubs_default_model -gpuid 0
These are the latest lines of the output:
Epoch 1, 702100/70037; acc: 64.64; ppl: 5.63; xent: 1.73; 5911 src tok/s; 6023 tgt tok/s; 65343 s elapsed
Epoch 1, 702150/70037; acc: 65.89; ppl: 5.32; xent: 1.67; 5979 src tok/s; 5970 tgt tok/s; 65348 s elapsed
Epoch 1, 702200/70037; acc: 64.75; ppl: 5.69; xent: 1.74; 5587 src tok/s; 5725 tgt tok/s; 65353 s elapsed
Epoch 1, 702250/70037; acc: 66.44; ppl: 5.14; xent: 1.64; 5688 src tok/s; 5907 tgt tok/s; 65357 s elapsed
Epoch 1, 702300/70037; acc: 66.08; ppl: 5.31; xent: 1.67; 5789 src tok/s; 6056 tgt tok/s; 65362 s elapsed
Epoch 1, 702350/70037; acc: 65.28; ppl: 5.41; xent: 1.69; 5609 src tok/s; 5779 tgt tok/s; 65367 s elapsed
Epoch 1, 702400/70037; acc: 65.04; ppl: 5.54; xent: 1.71; 5847 src tok/s; 5885 tgt tok/s; 65372 s elapsed
Epoch 1, 702450/70037; acc: 65.13; ppl: 5.58; xent: 1.72; 5492 src tok/s; 5743 tgt tok/s; 65376 s elapsed
Epoch 1, 702500/70037; acc: 66.22; ppl: 5.12; xent: 1.63; 6128 src tok/s; 6154 tgt tok/s; 65381 s elapsed
Epoch 1, 702550/70037; acc: 64.83; ppl: 5.61; xent: 1.72; 5363 src tok/s; 5635 tgt tok/s; 65386 s elapsed
Epoch 1, 702600/70037; acc: 64.78; ppl: 5.51; xent: 1.71; 6045 src tok/s; 6137 tgt tok/s; 65391 s elapsed
Epoch 1, 702650/70037; acc: 67.48; ppl: 4.91; xent: 1.59; 5212 src tok/s; 5578 tgt tok/s; 65395 s elapsed
Epoch 1, 702700/70037; acc: 63.57; ppl: 5.99; xent: 1.79; 5617 src tok/s; 5675 tgt tok/s; 65400 s elapsed