CTranslate2 3.0 release

guillaumekln · November 7, 2022, 2:45pm

We just released the version 3.0 of CTranslate2! Here’s an overview of the main changes:

First speech-to-text model: Whisper

The main highlight of this version is the integration of the Whisper speech-to-text model that was published by OpenAI a few weeks ago.

Its architecture is very similar to a text-to-text Transformer model but it uses Conv1D layers to transform the audio features. On GPU, Conv1D layers are implemented using cuDNN which is a new optional dependency.

The current implementation already supports many CTranslate2 features and optimizations such as quantization, asynchronous execution, decoding with random sampling, etc.

See a conversion and usage example in the Transformers guide.

Removal of the decoding options `normalize_scores` and `allow_early_exit`

These options are removed to provide a better default behavior and improve consistency with other frameworks.

The scores are now always divided by pow(length, length_penalty) with length_penalty defaulting to 1
The beam search will exit early only when no penalties are used

The outputs are expected to be slightly different following this change.

Compatibility with OpenNMT-py V3 checkpoints

As mentioned in OpenNMT-py v3.0 is out!, the latest OpenNMT-py version changed how the vocabularies are saved in the checkpoints. The CTranslate2 converter have been updated accordingly while still supporting older checkpoints.

New `config.json` file in the model directory

Newly converted models will now include an additional configuration file: config.json. This file is meant to contain non structural model parameters such as parameters related to the input and the vocabulary, for example:

{
  "add_source_bos": false,
  "add_source_eos": false,
  "bos_token": "<s>",
  "decoder_start_token": "</s>",
  "eos_token": "</s>",
  "unk_token": "<unk>"
}

In the future, the file could contain other useful information about the model and even set the default translation/generation options to use for this model.

Passing and returning N-dimensional arrays in Python

The Python module exposes a new StorageView class which is used to pass or return N-dimensional arrays, for example:

to pass audio features to the Whisper model
to return the full LM output logits from the new method Generator.forward_batch

The object implements the Array Interface and CUDA Array Interface meaning you can convert arrays from or to Numpy and PyTorch without copying the underlying data. See an example in the class documentation.

This major version also comes with other breaking changes that should not impact most usages. See the full release note for more details.

ymoslem · November 11, 2022, 7:33pm

That is a great effort. Many thanks to Guillaume and all the team.

Can we please have BLOOM support? Thanks again!