NLLB-200 with CTranslate2

NLLB-200 refers to a range of open-source pre-trained machine translation models. They can be used via FairSeq or Hugging Face Transformers. Recently, CTranslate2 has introduced inference support for some Transformers models, including NLLB. This tutorial aims at providing ready-to-use models in the CTranslate2 format, and code examples for using these NLLB models in CTranslate2 along with SentencePiece tokenization.

Download NLLB-200 models

Load the model and tokenizer

import ctranslate2
import sentencepiece as spm

# [Modify] Set paths to the CTranslate2 and SentencePiece models
ct_model_path = "nllb-200-3.3B-int8"
sp_model_path = "flores200_sacrebleu_tokenizer_spm.model"

device = "cuda"  # or "cpu"

# Load the source SentecePiece model
sp = spm.SentencePieceProcessor()

translator = ctranslate2.Translator(ct_model_path, device)

Translate a list of sentences

source_sents = ["Ntabwo ntekereza ko iyi modoka ishaje izagera hejuru yumusozi.",
                "Kanda iyi buto hanyuma umuryango ukingure",
                "Ngendahimana yashakaga ikaramu"

# Source and target langauge codes
src_lang = "kin_Latn"
tgt_lang = "eng_Latn"

beam_size = 4

source_sentences = [sent.strip() for sent in source_sentences]
target_prefix = [[tgt_lang]] * len(source_sentences)

# Subword the source sentences
source_sents_subworded = sp.encode_as_pieces(source_sentences)
source_sents_subworded = [[src_lang] + sent + ["</s>"] for sent in source_sents_subworded]
print("First subworded source sentence:", source_sents_subworded[0], sep="\n")

# Translate the source sentences
translator = ctranslate2.Translator(ct_model_path, device=device)
translations_subworded = translator.translate_batch(source_sents_subworded, batch_type="tokens", max_batch_size=2024, beam_size=beam_size, target_prefix=target_prefix)
translations_subworded = [translation.hypotheses[0] for translation in translations_subworded]
for translation in translations_subworded:
  if tgt_lang in translation:

# Desubword the target sentences
translations = sp.decode(translations_subworded)

print("First sentence and translation:", source_sentences[0], translations[0], sep="\n• ")


• I don’t think this old car will make it to the top of the hill.
• Click this button and the door will open.
• Ngendahimana was looking for a pen.


You can also use this Google Colab notebook.

Licence of models


Relevant projects


Evaluation results on the TICO-19 dataset (3,070 segments) for English-to-Arabic, English-to-French, and English-to-Kinyarwanda language pairs. NLLB and OPUS models were converted to the CTranslate2 format with int8 quantization.

Language System spBLEU ↑ chrF++ ↑ TER ↓ COMET ↑
EN-AR NLLB 600M 35.66 54.6 62.07 54.53
NLLB 1.2B 41.1 58.51 57.15 63.85
NLLB 3.3B 43.42 60.11 55.58 66.8
OPUS (bt-big) 43.11 60.79 57.24 63.64
Google Cloud Translation API 43.56 61.58 57.79 65.5
EN-ES NLLB 600M 53.31 72.19 37.13 83.09
NLLB 1.2B 56.1 73.85 34.96 85.91
NLLB 3.3B 57.47 74.6 33.99 86.86
OPUS (bt-big) 54.99 72.66 36.26 83.69
Google Cloud Translation API 58.98 75.17 32.46 86.62
EN-FR NLLB 600M 43.25 64.17 51.28 56.16
NLLB 1.2B 46.3 66.25 48.68 59.76
NLLB 3.3B 47.25 66.88 48.19 60.91
OPUS (bt-big) 46.05 65.08 49.79 56.28
Google Cloud Translation API 46.81 66.34 47.01 59.01
EN-RW NLLB 600M 19.46 47.61 80.01 N/A
NLLB 1.2B 23.6 50.73 74.53 N/A
NLLB 3.3B 25.1 52.53 73.2 N/A
OPUS (Tatoeba 2021) 1.38 15.32 153.58 N/A
OPUS (2020) 5.58 27.05 101.25 N/A
Google Cloud Translation API 20.63 48.37 73.54 N/A
EN-ZH NLLB 600M 24.9 33.87 109.37 39.28
NLLB 1.2B 29.02 37.45 110.22 50.05
NLLB 3.3B 31.35 39.08 109.52 53.89
OPUS 37.51 40.72 121.49 50.4
Google Cloud Translation API 48.58 52.02 70.87 73.62

Nice !
do you happen to have the speed in tok/sec for each ?

I will try to import the NLLB models in OpenNMT-py to see if we can finetune them.

Hi Vincent!

I ran it for the English-to-Spanish language pair on A100 40GB GPU memory from Google Colab Pro.

TICO-17 Statistics

Source language: English
Sentences (unique): 3,070
Words: 70,365
Tokens (flores200 spm tokenizer): 108,844

Time for English-to-Spanish

NLLB 3.3B: 93.6s
NLLB 1.2B: 61.2s
NLLB 600M: 37.1
OPUS: 17.3s


On the OpenNMT-py version of the NLLB-200 models I uploaded here: OpenNMT-py models - OpenNMT

I am getting about 600 tok/second with the 3.3B model, with a RTX 3090.

It is not so bad, but Ctranslate2 is the fastest inference toolkit around.

1 Like

Getting 403 on the NLLB downloads. Could you check? Thanks!

Hi James!

Can you please try wget, either in your machine’s Terminal, or maybe Google Colab or PythonAnywhere.

wget ""

I hope this helps.


Yes, the zip version works. The models on the page you posted above are linked directly to .pt files.

I got the 3.3B instead:

However, those are CTranslate2 versions. I was looking to get .pt versions so that I could finetune on them with OpenNMT.

Ah, okay. @vince62s could you please make sure that the models at OpenNMT-py models - OpenNMT are public. Thanks!

sorry they are public now.

1 Like

Thanks! Links work now!

If you manage to finetune something, I’m interested to see the results.
Bear in mind that they use a 256+K vocab size and you may be tempted to update to a reduced size especially if you don’t use some scripts.

The difficulty now is to collect data and clean them. Also will need to figure out the YAML file needed for training (the model architecture, etc).

I can help you with that.

ok I ended up trying to finetune NLLB-200.

With a 24GB RTX card, it is impossible to finetune the 1.3B / 3.3B models without major changes n the framework (maybe we can envision Fairscale / pytorch FSDP.

Finetuning the 600M model is not really interesting given the poor performance.

I made small changes to the checkpoint to make it trainable and a few small changes in transforms (committed).
End result it works but results are behind a bilingual model of a similar size.

My 2 cents on all of this:
NLLB-200 is useless except for low resource languages to and from high resource languages.
As a matter of fact I think it is only used for this purpose by Wikipedia.

@vince62s I’m new to all of this and was originally looking to use a bunch of OPUS-MT language-pair models to do a bunch of adhoc translation in a social network platform I’m building. But NLLB looks like quite a promising way to simplify it all. So, why do you say that NLLB is useless? Is it because it is much slower than OPUS-MT? Do you have a specific recommendation? Thanks!

If the need is to have the best quality possible on top 10 languages, then it is better to use bilingual models (Opus is fine is you want pre-trained ones, even though Opus is not always SOTA)

If the goal is to have as many languages as possible, then NLLB is fine but the quality on top10 languages will not be very good.

1 Like

Thanks! Yeah, for the most part I’ll be running the top languages rather than every possible language.

I’m not looking to do any training myself - I just want something pretrained that has a good balance of quality and speed of translation. Required server memory is also a factor - I’d like to have the model(s) preloaded to eliminate cold start time. I figured NLLB would suit these needs well, as compared to keeping 10-50 OPUS models preloaded.

So, when you say NLLB is “not very good” for top 10 languages, what do you mean? The tables at the top of this thread seem to show NLLB performing fairly similarly to OPUS and Google for some major language pairs…

Hi, I would like to replicate this experiment in my hardware and try some kind of domain adaptation with this model. If I can achieve some improvements in the 600M, I will try to repeat this with the bigger models.

How do you finetune NLLB? Should I take anything into account before training or is it easy to do?

I updated the 600M checkpoint on S3.

Key options are below (the rest you can use whatever you use, adam, …)

share_vocab: true
src_vocab: “/nllb-200/dictionary.txt”
src_words_min_frequency: 1
src_vocab_size: 257000
tgt_vocab: “/nllb-200/dictionary.txt”
tgt_words_min_frequency: 1
tgt_vocab_size: 257000
src_vocab_multiple: 8

Corpus opts:

path_src: “/en-de/cc-matrix-ende.en”
path_tgt: “/en-de/”
transforms: [sentencepiece, prefix, suffix, filtertoolong]
weight: 10
src_prefix: “”
tgt_prefix: “deu_Latn”
src_suffix: “ eng_Latn”
tgt_suffix: “”
update_vocab: true
train_from: “/nllb-200/”
reset_optim: all
save_data: “/nllb-200”
save_model: “/nllb-200/nllb-200-600M-onmt”
decoder_start_token: ‘’


src_subword_model: “/nllb-200/flores200_sacrebleu_tokenizer_spm.model”
tgt_subword_model: “/nllb-200/flores200_sacrebleu_tokenizer_spm.model”
encoder_type: transformer
decoder_type: transformer
enc_layers: 12
dec_layers: 12
heads: 16
hidden_size: 1024
word_vec_size: 1024
transformer_ff: 4096
dropout_steps: [0, 15000, 30000]
dropout: [0.1, 0.1, 0.1]
attention_dropout: [0.1, 0.1, 0.1]
share_decoder_embeddings: true
share_embeddings: true
position_encoding: true
position_encoding_type: ‘SinusoidalConcat’