OpenNMT Forum

Training fails while getting batch


After training for several hours and successfully saving about 5 checkpoints, training errors out with a message:

**./onmt/data/Dataset.lua:102: attempt to index a nil value** 
./onmt/data/Dataset.lua:102: in function 'getBatch'
./onmt/train/Trainer.lua:277: in function 'trainEpoc'
./onmt/train/Trainer.lua:484: in function 'train'

This appears to be where it is setting the start of the batch range. A little stuck on what may be the cause. Could it have to do with changing the batch_size when resuming training from a checkpoint?

(Guillaume Klein) #2

What batch size did you set?


Initially used a batch size of 200, then when resuming from checkpoint batch size was increased to 300.

(This issue was resolved by starting training over without specifying batch_size)