Simple OpenNMT-py REST server

pytorch

(Pltrdy) #1

I made a simple REST server to use OpenNMT-py models.

One can load/unload models (both CPU/GPU) and get translations using those models.

I] Start the server


0) Get the code

It’s currently not a merged code (see PR), one can pull the code from my fork:

git remote add pltrdy https://github.com/pltrdy/OpenNMT-py
git pull pltrdy server:server
git checkout server

1) Install flask

pip install flask

2) Put some models

mkdir available_models/
cp $path_to_my_model available_models

3) start the server

python server.py

II] API Usage


0) set the hostname

export $hostname="127.0.0.1

1) list models

curl http://$hostname:5000/models

Result:

  "available": [
    "wmt14.en-de_acc_69.22_ppl_4.33_e9.pt",
    "wmt14.en-de_acc_69.22_ppl_4.33_e9.light.pt"
  ],
  "loaded": []
}

2) load a model

We can now load a model from this list. This query accepts the same parameters as translate.py e.g.

curl -i -X POST -H "Content-Type: application/json" -d '{"model": "wmt14.en-de_acc_69.22_ppl_4.33_e9.pt", "gpu": 0, "beam_size": 5}' http://$hostname:5000/load_model

Result:

{"load_time": 3.0516507625579834, "model_id": 0, "status": "ok"}

The model_id here will be needed for translation to indentify which model to use.

3) Translate

(this example involves subwords)

curl -i -X POST -H "Content-Type: application/json" -d '{"text": "▁the ▁mould ▁edge s ▁( cor n ers ) ▁of ▁the ▁steel ▁in go t ▁mould ."}' http://10.0.1.183:5000/translate/0

Result:

{
  "model_id": 0,
  "result": "\u2581die \u2581Formen kant en \u2581( K \u00f6r ner ) \u2581des \u2581Stahl g u\u00df form .\n",
  "status": "ok",
  "time": {
    "total": 8.510261535644531,
    "translation": 8.509992599487305,
    "writing_src": 0.0002689361572265625
  }
}

4) Unload a model

# unload model number 1
curl http://10.0.1.183:5000/unload_model/1

5) At any time, check which models are loaded

For example we may have various models, some using GPU, e.g.

{
  "available": [
    "wmt14.en-de_acc_69.22_ppl_4.33_e9.pt",
    "wmt14.en-de_acc_69.22_ppl_4.33_e9.light.pt"
  ],
  "loaded": [
    {
      "gpu": -1,
      "model": "./available_models/wmt14.en-de_acc_69.22_ppl_4.33_e9.light.pt",
      "model_id": 0
    },
    {
      "gpu": 0,
      "model": "./available_models/wmt14.en-de_acc_69.22_ppl_4.33_e9.light.pt",
      "model_id": 2
    },
    {
      "gpu": 0,
      "model": "./available_models/model_2.pt",
      "model_id": 3
    }
  ]
}

NOTES:

  • I have been removing the optim part of the model in order to make it smaller in size, therefore speeding up the loading. See discussion (and script) about it here.
  • It’s a quite naïve implementation, do not hesitate to contribute or discuss how it could be improved.