Does the rest server handle concurrency? When sending multiple simultaneous request to the server, these are handled sequentially, which means, the second request needs to wait until the first one is completely handled. Is there any way to support concurrency? Or at least where (in the code) are all the request stored until they are handled?
Thank you in advance
No, the example REST server does not support concurrency. But in most cases, this is not a big issue:
- For CPU translation, many cores are already used for a single translation. Executing more requests in parallel will just make all requests slower.
- For GPU translation, I expect that Torch re-uses the same CUDA stream by default, meaning that CUDA kernels are eventually executed sequentially on the device.
So, if two clients request translations at the same time, one of the will have to wait until the first one has finished? It doesn’t seem an ideal real-life scenario, where many users might want to user the translator.
Yes. But you can still start multiple instances of the REST server and have a load balancer on top of them.
Thank you! I’ll definitely implement a load balancer.
For what it’s worth I can say that I have fired 3-4 requests to a single instance of the rest server at virtually the same moment and seen the translations come back at what appears to be the same time. This was done using a GPU server,
This is true @tel34 . However, I have added some logs in the server side to see the real behavior. When sending multiple requests, they are handled sequentially, although the answers are received for all requests at the same moment as you pointed out, which is, indeed, a strange behavior. I couldn’t guess why
if requests are handled sequentially why isnt the time for 2 request is doubled. Lets say my one file with 5 sentences is translated in 28 sec. But if I fire two request of same data parallel, then the total time increase to around 36 sec for both files not 56 sec??