Add nllb support

Hello, facebook has just released a new translation model, which should replace m2M100. It supports 200 languages and is better than the old one in terms of quality, is it possible that it can be supported on ctranslate2? I tried to convert it with this command without success:

ct2-transformers-converter --model facebook/nllb-200-distilled-600M --output_dir ct2_model

ValueError: Tokenizer class NllbTokenizer does not exist or is not currently imported.

Here is the link of the model: facebook/nllb-200-distilled-600M ยท Hugging Face
Have a nice day and thanks for reading!

This PR is adding support for NLLB models:

The architecture is actually the same as M2M-100.

Note that the conversion requires transformers==4.21.0 which is not yet released! I will wait for this new version before merging the changes.

3 Likes

okay, thanks you!

I just released CTranslate2 2.21.0 with NLLB support. Check out the example in the documentation:

https://opennmt.net/CTranslate2/guides/transformers.html#nllb

2 Likes