Hi,
In my quest for the best possible results, I get a difference between adagrad with learning rate 0.1 and default settings (sgd with learning rate 1) of about 40 BLEU points. Something is wrong with the Adagrad, I guess, but what?
I train with these options
th train.lua -data /home/wiske/tmmt/ennl/nmt/rnn/ennl-train.t7 -save_model /home/wiske/tmmt/ennl/nmt/brnn/rnnsize_750/en2nl.adagrad -optim adagrad -learning_rate 0.1 -encoder_type brnn -rnn_size 750 -gpuid 2 -end_epoch 20
Any hints?