Rest server concurrency

anderleich · July 19, 2018, 2:57pm

Hi,
Does the rest server handle concurrency? When sending multiple simultaneous request to the server, these are handled sequentially, which means, the second request needs to wait until the first one is completely handled. Is there any way to support concurrency? Or at least where (in the code) are all the request stored until they are handled?
Thank you in advance

guillaumekln · July 20, 2018, 8:00am

No, the example REST server does not support concurrency. But in most cases, this is not a big issue:

For CPU translation, many cores are already used for a single translation. Executing more requests in parallel will just make all requests slower.
For GPU translation, I expect that Torch re-uses the same CUDA stream by default, meaning that CUDA kernels are eventually executed sequentially on the device.

anderleich · July 20, 2018, 11:58am

So, if two clients request translations at the same time, one of the will have to wait until the first one has finished? It doesn’t seem an ideal real-life scenario, where many users might want to user the translator.

guillaumekln · July 20, 2018, 12:03pm

Yes. But you can still start multiple instances of the REST server and have a load balancer on top of them.

anderleich · July 20, 2018, 12:57pm

Thank you! I’ll definitely implement a load balancer.

tel34 · July 20, 2018, 4:52pm

For what it’s worth I can say that I have fired 3-4 requests to a single instance of the rest server at virtually the same moment and seen the translations come back at what appears to be the same time. This was done using a GPU server,

anderleich · July 20, 2018, 5:08pm

This is true @tel34 . However, I have added some logs in the server side to see the real behavior. When sending multiple requests, they are handled sequentially, although the answers are received for all requests at the same moment as you pointed out, which is, indeed, a strange behavior. I couldn’t guess why

ajitesh3 · December 23, 2019, 9:54am

if requests are handled sequentially why isnt the time for 2 request is doubled. Lets say my one file with 5 sentences is translated in 28 sec. But if I fire two request of same data parallel, then the total time increase to around 36 sec for both files not 56 sec??