Whisper CTranslate2

Hello, I’m just wondering if anyone has made a template using whisper CTranslate2 for translating 30s+ audio files. Once CTranslate2 converts the model, is it possible to access the properties that existed in the model before conversion for example certain layers of the decoder or encoders? Additionally, can we register forward hooks to layers in the whisper model specifically decoder blocks?

+1 on this :slight_smile:


As discussed on GitHub, here’s my take on implementing a full transcription with CTranslate2:

The answer is no to both questions.

Once the model is converted to CTranslate2 it is a black box that is fully running in C++. This is one of the main reason it is faster than openai/whisper.

However, if there are popular extensions to the model, we could implement them directly in the core implementation.

Thank you for answering my questions and providing an example of your template. Could lead me to some guides so I could attempt converting other versions of models and contribute to this forum/platform, regarding your question the extension I was trying to convert was this whisper-timestamped, which involves using forward hooks in the decoder blocks.