NLLB-200 with CTranslate2

ymoslem · November 11, 2022, 7:04pm

NLLB-200 refers to a range of open-source pre-trained machine translation models. They can be used via FairSeq or Hugging Face Transformers. Recently, CTranslate2 has introduced inference support for some Transformers models, including NLLB. This tutorial aims at providing ready-to-use models in the CTranslate2 format, and code examples for using these NLLB models in CTranslate2 along with SentencePiece tokenization.

Download NLLB-200 models

NLLB 600M - CTranslate2 int8
NLLB 1.3B - CTranslate2 int8
NLLB 3.3B - CTranslate2 int8
SentencePiece model - 200 languages

Load the model and tokenizer

import ctranslate2
import sentencepiece as spm


# [Modify] Set paths to the CTranslate2 and SentencePiece models
ct_model_path = "nllb-200-3.3B-int8"
sp_model_path = "flores200_sacrebleu_tokenizer_spm.model"

device = "cuda"  # or "cpu"

# Load the source SentecePiece model
sp = spm.SentencePieceProcessor()
sp.load(sp_model_path)

translator = ctranslate2.Translator(ct_model_path, device)

Translate a list of sentences

source_sents = ["Ntabwo ntekereza ko iyi modoka ishaje izagera hejuru yumusozi.",
                "Kanda iyi buto hanyuma umuryango ukingure",
                "Ngendahimana yashakaga ikaramu"
               ]

# Source and target langauge codes
src_lang = "kin_Latn"
tgt_lang = "eng_Latn"

beam_size = 4

source_sentences = [sent.strip() for sent in source_sentences]
target_prefix = [[tgt_lang]] * len(source_sentences)

# Subword the source sentences
source_sents_subworded = sp.encode_as_pieces(source_sentences)
source_sents_subworded = [[src_lang] + sent + ["</s>"] for sent in source_sents_subworded]
print("First subworded source sentence:", source_sents_subworded[0], sep="\n")

# Translate the source sentences
translator = ctranslate2.Translator(ct_model_path, device=device)
translations_subworded = translator.translate_batch(source_sents_subworded, batch_type="tokens", max_batch_size=2024, beam_size=beam_size, target_prefix=target_prefix)
translations_subworded = [translation.hypotheses[0] for translation in translations_subworded]
for translation in translations_subworded:
  if tgt_lang in translation:
    translation.remove(tgt_lang)

# Desubword the target sentences
translations = sp.decode(translations_subworded)


print("First sentence and translation:", source_sentences[0], translations[0], sep="\n• ")

Output:

Translations:
• I don’t think this old car will make it to the top of the hill.
• Click this button and the door will open.
• Ngendahimana was looking for a pen.

Notebook

You can also use this Google Colab notebook.

Licence of models

CC-BY-NC

Relevant projects

NLLB/FairSeq [GitHub | Paper]
CTranslate2 [GitHub | Docs | Paper]
SentencePiece [GitHub | Paper]

ymoslem · December 28, 2022, 3:31am

Evaluation results on the TICO-19 dataset (3,070 segments) for English-to-Arabic, English-to-French, and English-to-Kinyarwanda language pairs. NLLB and OPUS models were converted to the CTranslate2 format with int8 quantization.

Language	System	spBLEU ↑	chrF++ ↑	TER ↓	COMET ↑
EN-AR	NLLB 600M	35.66	54.6	62.07	54.53
	NLLB 1.2B	41.1	58.51	57.15	63.85
	NLLB 3.3B	43.42	60.11	55.58	66.8
	OPUS (bt-big)	43.11	60.79	57.24	63.64
	Google Cloud Translation API	43.56	61.58	57.79	65.5
EN-ES	NLLB 600M	53.31	72.19	37.13	83.09
	NLLB 1.2B	56.1	73.85	34.96	85.91
	NLLB 3.3B	57.47	74.6	33.99	86.86
	OPUS (bt-big)	54.99	72.66	36.26	83.69
	Google Cloud Translation API	58.98	75.17	32.46	86.62
EN-FR	NLLB 600M	43.25	64.17	51.28	56.16
	NLLB 1.2B	46.3	66.25	48.68	59.76
	NLLB 3.3B	47.25	66.88	48.19	60.91
	OPUS (bt-big)	46.05	65.08	49.79	56.28
	Google Cloud Translation API	46.81	66.34	47.01	59.01
EN-RW	NLLB 600M	19.46	47.61	80.01	N/A
	NLLB 1.2B	23.6	50.73	74.53	N/A
	NLLB 3.3B	25.1	52.53	73.2	N/A
	OPUS (Tatoeba 2021)	1.38	15.32	153.58	N/A
	OPUS (2020)	5.58	27.05	101.25	N/A
	Google Cloud Translation API	20.63	48.37	73.54	N/A
EN-ZH	NLLB 600M	24.9	33.87	109.37	39.28
	NLLB 1.2B	29.02	37.45	110.22	50.05
	NLLB 3.3B	31.35	39.08	109.52	53.89
	OPUS	37.51	40.72	121.49	50.4
	Google Cloud Translation API	48.58	52.02	70.87	73.62

vince62s · December 28, 2022, 9:17am

Nice !
do you happen to have the speed in tok/sec for each ?

I will try to import the NLLB models in OpenNMT-py to see if we can finetune them.

ymoslem · January 3, 2023, 12:45am

Hi Vincent!

I ran it for the English-to-Spanish language pair on A100 40GB GPU memory from Google Colab Pro.

TICO-17 Statistics

Source language: English
Sentences (unique): 3,070
Words: 70,365
Tokens (flores200 spm tokenizer): 108,844

Time for English-to-Spanish

NLLB 3.3B: 93.6s
NLLB 1.2B: 61.2s
NLLB 600M: 37.1
OPUS: 17.3s

vince62s · January 10, 2023, 9:52am

Great.

On the OpenNMT-py version of the NLLB-200 models I uploaded here: OpenNMT-py models - OpenNMT

I am getting about 600 tok/second with the 3.3B model, with a RTX 3090.

It is not so bad, but Ctranslate2 is the fastest inference toolkit around.

JOHW85 · January 17, 2023, 5:06pm

Getting 403 on the NLLB downloads. Could you check? Thanks!

ymoslem · January 18, 2023, 12:51am

Hi James!

Can you please try wget, either in your machine’s Terminal, or maybe Google Colab or PythonAnywhere.

wget "https://pretrained-nmt-models.s3.us-west-2.amazonaws.com/CTranslate2/nllb/nllb-200_600M_int8_ct2.zip"

I hope this helps.

Yasmin

JOHW85 · January 18, 2023, 2:35am

Yes, the zip version works. The models on the page you posted above are linked directly to .pt files.

I got the 3.3B instead: https://pretrained-nmt-models.s3.us-west-2.amazonaws.com/CTranslate2/nllb/nllb-200_3.3B_int8_ct2.zip

However, those are CTranslate2 versions. I was looking to get .pt versions so that I could finetune on them with OpenNMT.

ymoslem · January 18, 2023, 3:23am

Ah, okay. @vince62s could you please make sure that the models at OpenNMT-py models - OpenNMT are public. Thanks!

vince62s · January 18, 2023, 6:43am

sorry they are public now.

JOHW85 · January 18, 2023, 3:54pm

Thanks! Links work now!

vince62s · January 18, 2023, 5:27pm

If you manage to finetune something, I’m interested to see the results.
Bear in mind that they use a 256+K vocab size and you may be tempted to update to a reduced size especially if you don’t use some scripts.

JOHW85 · January 19, 2023, 2:03am

The difficulty now is to collect data and clean them. Also will need to figure out the YAML file needed for training (the model architecture, etc).

vince62s · January 19, 2023, 9:24am

join gitter.im
I can help you with that.

vince62s · January 25, 2023, 7:28am

ok I ended up trying to finetune NLLB-200.

With a 24GB RTX card, it is impossible to finetune the 1.3B / 3.3B models without major changes n the framework (maybe we can envision Fairscale / pytorch FSDP.

Finetuning the 600M model is not really interesting given the poor performance.

I made small changes to the checkpoint to make it trainable and a few small changes in transforms (committed).
End result it works but results are behind a bilingual model of a similar size.

My 2 cents on all of this:
NLLB-200 is useless except for low resource languages to and from high resource languages.
As a matter of fact I think it is only used for this purpose by Wikipedia.

nickchomey · February 5, 2023, 9:27pm

@vince62s I’m new to all of this and was originally looking to use a bunch of OPUS-MT language-pair models to do a bunch of adhoc translation in a social network platform I’m building. But NLLB looks like quite a promising way to simplify it all. So, why do you say that NLLB is useless? Is it because it is much slower than OPUS-MT? Do you have a specific recommendation? Thanks!

vince62s · February 6, 2023, 8:42am

If the need is to have the best quality possible on top 10 languages, then it is better to use bilingual models (Opus is fine is you want pre-trained ones, even though Opus is not always SOTA)

If the goal is to have as many languages as possible, then NLLB is fine but the quality on top10 languages will not be very good.

nickchomey · February 6, 2023, 10:20am

Thanks! Yeah, for the most part I’ll be running the top languages rather than every possible language.

I’m not looking to do any training myself - I just want something pretrained that has a good balance of quality and speed of translation. Required server memory is also a factor - I’d like to have the model(s) preloaded to eliminate cold start time. I figured NLLB would suit these needs well, as compared to keeping 10-50 OPUS models preloaded.

So, when you say NLLB is “not very good” for top 10 languages, what do you mean? The tables at the top of this thread seem to show NLLB performing fairly similarly to OPUS and Google for some major language pairs…

martin_bombin · February 13, 2023, 11:20am

Hi, I would like to replicate this experiment in my hardware and try some kind of domain adaptation with this model. If I can achieve some improvements in the 600M, I will try to repeat this with the bigger models.

How do you finetune NLLB? Should I take anything into account before training or is it easy to do?

vince62s · February 13, 2023, 11:55am

I updated the 600M checkpoint on S3.

Key options are below (the rest you can use whatever you use, adam, …)

share_vocab: true
src_vocab: “/nllb-200/dictionary.txt”
src_words_min_frequency: 1
src_vocab_size: 257000
tgt_vocab: “/nllb-200/dictionary.txt”
tgt_words_min_frequency: 1
tgt_vocab_size: 257000
src_vocab_multiple: 8

Corpus opts:

data:
mydataset:
path_src: “/en-de/cc-matrix-ende.en”
path_tgt: “/en-de/cc-matrix-ende.de”
transforms: [sentencepiece, prefix, suffix, filtertoolong]
weight: 10
src_prefix: “”
tgt_prefix: “deu_Latn”
src_suffix: “ eng_Latn”
tgt_suffix: “”
update_vocab: true
train_from: “/nllb-200/nllb-200-600M-onmt.pt”
reset_optim: all
save_data: “/nllb-200”
save_model: “/nllb-200/nllb-200-600M-onmt”
decoder_start_token: ‘’

Subword

src_subword_model: “/nllb-200/flores200_sacrebleu_tokenizer_spm.model”
tgt_subword_model: “/nllb-200/flores200_sacrebleu_tokenizer_spm.model”
encoder_type: transformer
decoder_type: transformer
enc_layers: 12
dec_layers: 12
heads: 16
hidden_size: 1024
word_vec_size: 1024
transformer_ff: 4096
dropout_steps: [0, 15000, 30000]
dropout: [0.1, 0.1, 0.1]
attention_dropout: [0.1, 0.1, 0.1]
share_decoder_embeddings: true
share_embeddings: true
position_encoding: true
position_encoding_type: ‘SinusoidalConcat’