Hello Guillaume, netxiao,
Thank you for the responses.
Indeed - the convergence is not as good as for the default setting.
I am using two GPUs -- one with 12GB of RAM and one with 4GB. I noticed that during training the complete memory of the former is not utilized while it would be great if it is, since it is pointless if it is not being used 100%. That's the reason I increased the batch size. And, as I mentioned earlier, the time decreases (as memory utilization increased).
I will play a bit more and try to see if I can think of some way to improve.
Any suggestions are more than welcome.