Support madlad400 on ctranslate2

Hey, recently madlad400 model has been released! You can access it here:

The model has an open licence (commercial use) and is better than nllb:

I opened this post to demande is it’s planned to support this model architecture on ctranslate2?


A user called jbochi has converted the models to hugging face format (link), and since it’s a T5 model, ctranslate2 should support it.
But if you try to convert it using ct2-transformers-converter you’ll get an error saying that vocab and embedding sizes are not compatible. This because the tokenizer has extra tokens that are not in the model itself which was caused by incorrectly converting the sentencepiece model. I made another tokenizer and was able to convert it to ctranslate2, but the output was repeated gibberish.

Yes I can convert the model but I have the same issue.

Check this

nice! Thanks you