Long story a little shorter, I’m using an Nvidia P4 GPU to run Whisper on a device that runs on a battery. For whatever reason, the P4 GPU will not throttle down when there is an process loaded on it (stays in P0 performance mode drawing 23W of power). I’m trying to work around this problem by having my application load and then unload the Whisper model when needed. I load the model by using
self.model = ctranslate2.models.Whisper(...)
I can delete the object
del self.model
I cannot, however, seem to figure out if there is a way to unload the model from the GPU. Deleting the object doesn’t seem to unload the process from the GPU. It stays there keeping the GPU in P0 model until I actually terminate my application.
Is there some solution to this problem?