I’m interested in creating an NMT model that is capable of translating text very quickly on a GPU. For my application, sacrificing BLEU (quality) is acceptable, but I would like to be able to translate 10-100 times more quickly than the translation speed of the default models (or better).
On my hardware - a GeForce GTX 1080 - I can get 340 tokens/sec for translation speed out-of-the-box (i.e. not setting any parameters in onmt).
I am considering sweeping through the parameter space for training (http://opennmt.net/OpenNMT/options/train/), but it would be nice to get some expert opinions on which parameters we ought to expect to matter.