Need support for continuous batching, which promises to improve the performance of the decoder part/decoder-only model.
If you are interested in this feature for CTranslate2, this is an open issue: