Ah, you are correct, I made an error in my analysis code, I was counting chars instead of tokens. I will correct my OP.
OK, the true figure is ~300 tokens/sec fully vanilla and out of the box.
every options that control the model size impact the performance: -word_vec_size, -rnn_size, -layers, etc. The lower the values, the faster the translation.
Do you know whether there is any existing research on what the tradeoffs in the space look like?
In particular, I expect that at some point reducing parameters would provide no further benefit due to vectorization of operations on the GPU, but I’m not familiar enough with OMNT to know when that will start to happen.
If not, I might start some experiments and post the results here to benefit others.