Would it be possible to add the conversion for google T5 and Facebook Bart in huggingface/transformers to ctranslate2? It seems like cTranslate2 performance is outstanding and it is very few framework that can support int8 quantization in GPU. (Onnx, TF, Pytorch don’t support it)
Yes, I think we should look into adding these models. I would first need to review the exact architecture they use to estimate the amount of work.
1 Like
Both T5 and BART can now be converted to CTranslate2. See the examples here:
1 Like