NLLB-200 refers to a range of open-source pre-trained machine translation models. They can be used via FairSeq or Hugging Face Transformers. Recently, CTranslate2 has introduced inference support for some Transformers models, including NLLB. This tutorial aims at providing ready-to-use models in the CTranslate2 format, and code examples for using these NLLB models in CTranslate2 along with SentencePiece tokenization.
import ctranslate2
import sentencepiece as spm
# [Modify] Set paths to the CTranslate2 and SentencePiece models
ct_model_path = "nllb-200-3.3B-int8"
sp_model_path = "flores200_sacrebleu_tokenizer_spm.model"
device = "cuda" # or "cpu"
# Load the source SentecePiece model
sp = spm.SentencePieceProcessor()
sp.load(sp_model_path)
translator = ctranslate2.Translator(ct_model_path, device)
Translate a list of sentences
source_sents = ["Ntabwo ntekereza ko iyi modoka ishaje izagera hejuru yumusozi.",
"Kanda iyi buto hanyuma umuryango ukingure",
"Ngendahimana yashakaga ikaramu"
]
# Source and target langauge codes
src_lang = "kin_Latn"
tgt_lang = "eng_Latn"
beam_size = 4
source_sentences = [sent.strip() for sent in source_sentences]
target_prefix = [[tgt_lang]] * len(source_sentences)
# Subword the source sentences
source_sents_subworded = sp.encode_as_pieces(source_sentences)
source_sents_subworded = [[src_lang] + sent + ["</s>"] for sent in source_sents_subworded]
print("First subworded source sentence:", source_sents_subworded[0], sep="\n")
# Translate the source sentences
translator = ctranslate2.Translator(ct_model_path, device=device)
translations_subworded = translator.translate_batch(source_sents_subworded, batch_type="tokens", max_batch_size=2024, beam_size=beam_size, target_prefix=target_prefix)
translations_subworded = [translation.hypotheses[0] for translation in translations_subworded]
for translation in translations_subworded:
if tgt_lang in translation:
translation.remove(tgt_lang)
# Desubword the target sentences
translations = sp.decode(translations_subworded)
print("First sentence and translation:", source_sentences[0], translations[0], sep="\n• ")
Output:
Translations:
• I don’t think this old car will make it to the top of the hill.
• Click this button and the door will open.
• Ngendahimana was looking for a pen.
Evaluation results on the TICO-19 dataset (3,070 segments) for English-to-Arabic, English-to-French, and English-to-Kinyarwanda language pairs. NLLB and OPUS models were converted to the CTranslate2 format with int8 quantization.
If you manage to finetune something, I’m interested to see the results.
Bear in mind that they use a 256+K vocab size and you may be tempted to update to a reduced size especially if you don’t use some scripts.
With a 24GB RTX card, it is impossible to finetune the 1.3B / 3.3B models without major changes n the framework (maybe we can envision Fairscale / pytorch FSDP.
Finetuning the 600M model is not really interesting given the poor performance.
I made small changes to the checkpoint to make it trainable and a few small changes in transforms (committed).
End result it works but results are behind a bilingual model of a similar size.
My 2 cents on all of this:
NLLB-200 is useless except for low resource languages to and from high resource languages.
As a matter of fact I think it is only used for this purpose by Wikipedia.
@vince62s I’m new to all of this and was originally looking to use a bunch of OPUS-MT language-pair models to do a bunch of adhoc translation in a social network platform I’m building. But NLLB looks like quite a promising way to simplify it all. So, why do you say that NLLB is useless? Is it because it is much slower than OPUS-MT? Do you have a specific recommendation? Thanks!
If the need is to have the best quality possible on top 10 languages, then it is better to use bilingual models (Opus is fine is you want pre-trained ones, even though Opus is not always SOTA)
If the goal is to have as many languages as possible, then NLLB is fine but the quality on top10 languages will not be very good.
Thanks! Yeah, for the most part I’ll be running the top languages rather than every possible language.
I’m not looking to do any training myself - I just want something pretrained that has a good balance of quality and speed of translation. Required server memory is also a factor - I’d like to have the model(s) preloaded to eliminate cold start time. I figured NLLB would suit these needs well, as compared to keeping 10-50 OPUS models preloaded.
So, when you say NLLB is “not very good” for top 10 languages, what do you mean? The tables at the top of this thread seem to show NLLB performing fairly similarly to OPUS and Google for some major language pairs…
Hi, I would like to replicate this experiment in my hardware and try some kind of domain adaptation with this model. If I can achieve some improvements in the 600M, I will try to repeat this with the bigger models.
How do you finetune NLLB? Should I take anything into account before training or is it easy to do?