Unload Whisper Model from GPU in Python

Long story a little shorter, I’m using an Nvidia P4 GPU to run Whisper on a device that runs on a battery. For whatever reason, the P4 GPU will not throttle down when there is an process loaded on it (stays in P0 performance mode drawing 23W of power). I’m trying to work around this problem by having my application load and then unload the Whisper model when needed. I load the model by using

self.model = ctranslate2.models.Whisper(...)

I can delete the object

del self.model

I cannot, however, seem to figure out if there is a way to unload the model from the GPU. Deleting the object doesn’t seem to unload the process from the GPU. It stays there keeping the GPU in P0 model until I actually terminate my application.

Is there some solution to this problem?

To clarify, the memory usage does decrease on the GPU when I unload the model, but the GPU is still being shown by the process and (about 100K remains in memory) and it keeps the GPU in P0 state. Is it possible to completely “disconnect” from the GPU without ending the python application?