Hello to everyone reading this.
I have been playing with the batch size looking to optimize the memory usage and reduce training time.
(Running OpenNMT 0.3)
What I noticed is that 2x the default batch size would lead to 1.5x slower iterations but of course 2x less iterations. Which in total would lead to 0.75x of the total consumed time as compared to the default batch size. That all good, but now comes the issue - perplexity per epoch for the larger batch size was much higher than the perplexity for the default batch size.
And my question: are there some guidelines about how to set up the batch size? Could the higher perplexity be explained or can it be reduced?