I am currently hosting a trained model online and trying to translate stuff with the model on-the-go. I have lowered the beam_size, but I realized a lot of time is wasted on initializing the model over and over again. Does anyone know:
How else can I increase the speed of translation?
How can I configure the model or translate.lua to stay active and “listen” for calls for translation so that the model need not spend time being terminated and reinitialized subsequently?