Validating trained model

I’ve been trying to use OpenNMT with some basic architecture proposed in one of the recent papers to generate questions from text. However I’ve been till now getting only one or two questions for any given input. Can anyone help me understand how to validate and tweak the model to work. I am not aware of any standard techniques that help us do this.


Couple questions:

  • Which papers are you referring?
  • How much data do you have?
  • How does the validation perplexity look like?

Hi Klein,
I am referring to “Neural Question Generation for Reading Comprehension” ( I have about 70k instances in the training set and 10k each in validation and testing. I am choosing the model with best perplexity to translate.

Did you compute BLEU scores as in the paper? Maybe you should just contact the authors as they happen to use OpenNMT.

I did contact the authors and they say that they have got the reported results using the same configuration that I am currently using with OpenNMT. But still somehow I get the below type of output for test data.
Test data:
SOS in 2010 , beyonc was featured on lady gaga 's single telephone ’ ’ and its music video . EOS
SOS the song topped the us pop songs chart , becoming the sixth number-one for both beyonc and gaga , tying them with mariah carey for most number-ones since the nielsen top 40 airplay chart launched in 1992 . EOS

SOS what was the name of the first person that madonna was a part of ? EOS
SOS what was the name of the book that madonna was a member of ? EOS

SOS and EOS are start and end of sentences.
Can you suggest why the model’s state is getting fixed and the output being generated is always almost same.

Little bit about the architecture that I am using.
2 layers for each encoder and decoder with hidden state size 600. I’ve been using SGD, pre-trained embeddings with fixed learning and global attention model. I am training the whole for 12-15 epochs. Below is the configuration in detail.

th train.lua -data data/qg-train.t7 -save_model model -rnn_size 600 -layers 2 -optim sgd -learning_rate 1 -learning_rate_decay 0.5 -start_decay_at 8 -max_batch_size 64 -dropout 0.3 -start_epoch 1 -end_epoch 12 -max_grad_norm 5 -word_vec_size 300 -pre_word_vecs_enc data/qg-src-emb-embeddings-300.t7 -pre_word_vecs_dec data/qg-tgt-emb-embeddings-300.t7 -fix_word_vecs_enc 1 -fix_word_vecs_dec 1 -gpuid 1 -brnn 1 -attention global -rnn_type LSTM -input_feed 1 -global_attention dot

What is the learning rate history according to the training logs? On small datasets, it is common that the validation perplexity fluctuates a lot which could start the learning rate decay too early.

Hey the perplexity is decreasing with each epoch. I’ve tried using ADAM instead of SGD and now I see few random translations rather than the same output being generated for any input. Can you point me to appropriate resources that talk about training encoder-decoder systems. Thanks for your help :smiley: