Experimenting with the Transformer model and training errors out before 10k steps:
Step 7800/200000; acc: 25.59; ppl: 48.25; xent: 3.88; lr: 0.00096; 212/101 tok/s; 7194 sec [6/1881]
Traceback (most recent call last):
File "/home/u/OpenNMT-py/train.py", line 109, in <module>
main(opt)
File "/home/u/OpenNMT-py/train.py", line 39, in main
single_main(opt, 0)
File "/home/u/OpenNMT-py/onmt/train_single.py", line 116, in main
valid_steps=opt.valid_steps)
File "/home/u/OpenNMT-py/onmt/trainer.py", line 192, in train
self._accum_batches(train_iter)):
File "/home/u/OpenNMT-py/onmt/trainer.py", line 127, in _accum_batches
for batch in iterator:
File "/home/u/OpenNMT-py/onmt/inputters/inputter.py", line 588, in __iter__
for batch in self._iter_dataset(path):
File "/home/u/OpenNMT-py/onmt/inputters/inputter.py", line 573, in _iter_dataset
for batch in cur_iter:
File "/home/u/anaconda3/lib/python3.6/site-packages/torchtext/data/iterator.py", line 156, in __iter__
yield Batch(minibatch, self.dataset, self.device)
File "/home/u/anaconda3/lib/python3.6/site-packages/torchtext/data/batch.py", line 34, in __init__
setattr(self, name, field.process(batch, device=device))
File "/home/u/OpenNMT-py/onmt/inputters/text_dataset.py", line 121, in process
base_data = self.base_field.process(batch_by_feat[0], device=device)
IndexError: list index out of range
The documentation lists many hyperparameters that are needed for the transformer, so I’m not sure which one of them may be affecting this. I did have to adjust some because my GPU setup couldn’t handle the ones in the documentation. I am also using pretrained embeddings. Here are the train params I am using:
~/OpenNMT-py/train.py
-save_model data/model
-pre_word_vecs_enc ~/data/embeddings.enc.pt
-pre_word_vecs_dec ~/data/embeddings.dec.pt
-data ~/data/data
-save_model ~/data/model
-layers 6
-rnn_size 512
-word_vec_size 512
-transformer_ff 2048
-heads 8
-encoder_type transformer
-decoder_type transformer
-position_encoding
-train_steps 200000
-max_generator_batches 2
-dropout 0.1
-batch_size 128
-batch_type tokens
-normalization tokens
-accum_count 2
-optim adam
-adam_beta2 0.998
-decay_method noam
-warmup_steps 8000
-learning_rate 2
-max_grad_norm 0
-param_init 0
-param_init_glorot
-label_smoothing 0.1
-valid_steps 10000
-save_checkpoint_steps 10000
-world_size 1
-gpu_ranks 0
-train_from ~/data/model_step_200.pt