What is the best way to clean up ctranslate2.Translator
after it has been initialized and used? Should the Python garbage collector be able to free up allocated memory?
opened 08:21AM - 27 Apr 23 UTC
closed 07:30AM - 03 May 23 UTC
I have used a simple Django server for my application https://github.com/kolserd… av/ana/tree/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate .
Django `urls.py` file https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/urls.py#L1-L27
Django `translate` handler file https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/api/translate.py#L1-L16
My `Translate`class https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/core/translate.py#L1-L41
After I start the server, I start repeating the same request with Curl:
```sh
curl -X POST -d '{"q": "test", "source":"en", "target":"ru"}' -H 'Content-Type: application/json' http://127.0.0.1:8000/translate
```
In another window, I open `top` with a filter by name `python`:
```sh
top | grep python
```
After a certain number (depending on server resources) of repeated requests, I see that the "python" process consumes a significant amount of memory. This share of memory will now never be freed, even if you stop repeating requests. This memory consumption remains until the process is restarted:
```sh
77899 kol 20 0 7468592 3.0g 174108 S 1.3 19.0 0:37.09 python
77899 kol 20 0 7697984 3.1g 174108 S 45.5 20.1 0:38.46 python
77899 kol 20 0 8418944 3.7g 174108 S 60.8 23.8 0:40.29 python
77899 kol 20 0 7730640 3.4g 174108 S 38.5 21.9 0:41.45 python
77899 kol 20 0 8091136 3.6g 174108 S 62.5 23.3 0:43.33 python
77899 kol 20 0 8091136 3.6g 174108 S 3.3 23.3 0:43.43 python
77899 kol 20 0 8091136 3.6g 174108 S 3.0 23.3 0:43.52 python
77899 kol 20 0 8091136 3.6g 174108 S 3.3 23.3 0:43.62 python
77899 kol 20 0 8091136 3.6g 174108 S 1.3 23.3 0:43.66 python
77899 kol 20 0 8091136 3.6g 174108 S 1.7 23.3 0:43.71 python
77899 kol 20 0 8091136 3.6g 174108 S 2.3 23.3 0:43.78 python
77899 kol 20 0 8091136 3.6g 174108 S 3.0 23.3 0:43.87 python
77899 kol 20 0 8091136 3.6g 174108 S 2.3 23.3 0:43.97 python
77899 kol 20 0 8091136 3.6g 174108 S 2.0 23.3 0:44.03 python
```
__If you continue to make requests, then the process will soon crash with status 247__
I will be grateful for any help.
self.translator
is always None
on every translation request. This means that for each translation event, the program creates a new PackageTranslation
instance. This can lead to a memory leak if suddenly ctranslate2
stores some process-bound data (globally).
1 Like
Hello,
have you tried forcing the garbage collector?
I had a similar issue with DataFrames in my preprocessing… and forcing the garbage collector solve it for me.
Example:
import gc
n = gc.collect()
print("Number of unreachable objects collected by GC:", n)
I basically did the n = gc.collect()
just right after I was done with some massive DataFrame to ensure the garbage collector get them really quick.
Best regards,
Samuel
2 Likes
Hello Argo,
Just out of curiosity, did my suggestion solve your problem?
2 Likes
I’m not sure, I haven’t been able to reliably reproduce the memory leak. However, people have told me they get memory leaks.
When I run this script on my computer the memory doesn’t noticeably increase.
committed 06:24PM - 30 Aug 23 UTC
import argostranslate.package
import argostranslate.translate
from_code = "en"
to_code = "es"
# Translate
while True:
translatedText = argostranslate.translate.translate("Hello World", from_code, to_code)
print(translatedText)
2 Likes
adityarg
(Aditya Raghuwanshi)
November 1, 2023, 4:54am
5
Hello Argo, were you able to find a working solution for the problem?
I am also encountering a similar issue. I am using translator.translate_batch()
in the API call and the memory seems to increase with every hit and it does not go down if I stop hitting the API.
Need to restart the services to free up the memory
Thanks
2 Likes
I still haven’t been able to find the source of the memory leak. We run CTranslate2 inside of LibreTranslate on a Nginx server with multiple processes that automatically restart so the memory leak doesn’t cause many problems.
1 Like