Would it be possible to add the conversion for google T5 and Facebook Bart in huggingface/transformers to ctranslate2? It seems like cTranslate2 performance is outstanding and it is very few framework that can support int8 quantization in GPU. (Onnx, TF, Pytorch don’t support it)
Yes, I think we should look into adding these models. I would first need to review the exact architecture they use to estimate the amount of work.
2 posts were split to a new topic: Question about Intel MKL