Transformer: error halfway through training

(R) #1

I’m trying the transformer model implemented in OpenNMT-py. I do semantic parsing, so my datasets are a little smaller (only 80k instances), though for the “normal” NMT models I still get good results.

I’m trying to see if the Transformer can still work with these small sets, and I get this error about halfway during training sometimes:

for i, batch in enumerate(train_iter):
File", line 432, in iter
for batch in self.cur_iter:
File “”, line 157, in iter
yield Batch(minibatch, self.dataset, self.device)
File “/lib/python2.7/site-packages/torchtext/data/”, line 27, in init
setattr(self, name, field.process(batch, device=device))
File “/lib/python2.7/site-packages/torchtext/data/”, line 185, in process
padded = self.pad(batch)
File “/lib/python2.7/site-packages/torchtext/data/”, line 203, in pad
max_len = max(len(x) for x in minibatch)
ValueError: max() arg is an empty sequence

So apparently it can select an empty batch randomly sometimes? This happened after 16,000 steps for example, with no indication that this was some special moment during training. I used a batch size of 2k, but it also happened with a batch size of 1k. Note that most experiments run fine, but this exact error happened 4 times already.

Can someone point me in the right direction? Could it be related to the fact that my dataset is quite small, especially relatively to the batch size?