How to configure the server to improve the speed of the prediction?

my configuration file

{
    "models_root": "./available_models",
    "models": [
        {   "id":100,
            "model": "model_0.pt",
            "opt": {
                "gpu": 3,
                "batch_size":1,
                "beam_size": 1,

                "n_best":1
            },
         "timeout": 3600,
            "on_timeout": "to_cpu"
        }

    ]
}

[2019-11-11 14:05:42,809 INFO] xxx - - [11/Nov/2019 14:05:42] “POST /translator/translate HTTP/1.1” 200 -
[2019-11-11 14:05:45,945 INFO] Running translation using 100
[2019-11-11 14:05:46,299 INFO] Using model #100 1 inputs
translation time: 0.351007
[2019-11-11 14:05:46,300 INFO] Translation Results: 1
[2019-11-11 14:05:46,300 INFO] - - [11/Nov/2019 14:05:46] “POST /translator/translate HTTP/1.1” 200 -
[2019-11-11 14:05:49,877 INFO] Running translation using 100
[2019-11-11 14:05:50,247 INFO] Using model #100 1 inputs
translation time: 0.367194
[2019-11-11 14:05:50,247 INFO] Translation Results: 1
[2019-11-11 14:05:50,248 INFO] - - [11/Nov/2019 14:05:50] “POST /translator/translate HTTP/1.1” 200 -
[2019-11-11 14:05:51,825 INFO] Running translation using 100
[2019-11-11 14:05:52,147 INFO] Using model #100 1 inputs
translation time: 0.319511

the speed is 0.3+ seconds, how to improve the speed of the prediction ?

You could for example batch your translation requests.