OpenNMT

Convert M2M model to CTranslate2

@guillaumekln Thanks for the great ctranslate2 library.

With this release which supports conversion of Transformer models trained with Fairseq, is it possible to convert the M2M100_418M model from Facebook AI too? I can’t seem to find straightforward examples of similar models which were converted to ctranslate2 so far. The original model is here while there’s a Huggingface transformer version available here

I was successfully able to convert the WMT16 model but it seems to have quite a different model structure.

Here’s the conversion script I used:

import os
import ctranslate2

data_dir = os.path.join(
    "path",
    "to",
    "wmt16.en-de.joined-dict.transformer"
)

converter = ctranslate2.converters.FairseqConverter(
    os.path.join(data_dir, "model.pt"), data_dir
)
output_dir = str(data_dir.join("ctranslate2_model"))

converter.convert(output_dir)

Run from the cloned CTranslate2 repo with:

python3 python/wmt16_converter.py

Many thanks for your help!

Did you try to convert the M2M model? If yes, what errors are reported?

Hi,
This is the script I tried to use with the Huggingface M2M100 model

import os
import ctranslate2

# relative path to where the script is run from
data_dir = os.path.join(
    "path...",
    "m2m100_418M"
)

# huggingface transformer m2m100 model
# Ref: https://huggingface.co/facebook/m2m100_418M
converter = ctranslate2.converters.FairseqConverter(
    os.path.join(data_dir, "pytorch_model.bin"), data_dir
)
output_dir = "/path/m2m_100/ctranslate2_model"
converter.convert(output_dir)

This is the error I got:

python3 python/m2m_100_converter.py

Traceback (most recent call last):
  File "python/m2m_100_converter.py", line 23, in <module>
    converter.convert(output_dir)
  File "/<path>/github.com/OpenNMT/CTranslate2/python/ctranslate2/converters/converter.py", line 45, in convert
    model_spec = self._load()
  File "/<path>/github.com/OpenNMT/CTranslate2/python/ctranslate2/converters/fairseq.py", line 84, in _load
    checkpoint = checkpoint_utils.load_checkpoint_to_cpu(self._model_path)
  File "<path>/Library/Python/3.8/lib/python/site-packages/fairseq/checkpoint_utils.py", line 228, in load_checkpoint_to_cpu
    args = state["args"]
KeyError: 'args'

I also tried with the original Fairseq M2M100_418M model from here and got an error.

Script:

import os
import ctranslate2

# relative path to where the script is run from
data_dir = os.path.join(
    "path...",
    "m2m100_original"
)

# original fairseq m2m100 model
# Ref: https://github.com/pytorch/fairseq/tree/master/examples/m2m_100
converter = ctranslate2.converters.FairseqConverter(
    os.path.join(data_dir, "418M_last_checkpoint.pt"), data_dir
)
output_dir = "/path/m2m100_original/ctranslate2_model"
converter.convert(output_dir)

Error:

python3 python/m2m_100_original_converter.py

External language dictionary is not provided; use lang-pairs to infer the set of supported languages. The language ordering is not stable which might cause misalignment in pretraining and finetuning.
Traceback (most recent call last):
  File "python/m2m_100_original_converter.py", line 23, in <module>
    converter.convert(output_dir)
  File "<path>/OpenNMT/CTranslate2/python/ctranslate2/converters/converter.py", line 45, in convert
    model_spec = self._load()
  File "<path>/OpenNMT/CTranslate2/python/ctranslate2/converters/fairseq.py", line 92, in _load
    task = fairseq.tasks.setup_task(args)
  File "<path>/Python/3.8/lib/python/site-packages/fairseq/tasks/__init__.py", line 28, in setup_task
    return TASK_REGISTRY[task_cfg.task].setup_task(task_cfg, **kwargs)
  File "<path>/Python/3.8/lib/python/site-packages/fairseq/tasks/translation_multi_simple_epoch.py", line 106, in setup_task
    langs, dicts, training = MultilingualDatasetManager.prepare(
  File "<path>/Python/3.8/lib/python/site-packages/fairseq/data/multilingual/multilingual_data_manager.py", line 371, in prepare
    dicts[lang] = load_dictionary(
  File "<path>/Python/3.8/lib/python/site-packages/fairseq/tasks/fairseq_task.py", line 54, in load_dictionary
    return Dictionary.load(filename)
  File "<path>/Python/3.8/lib/python/site-packages/fairseq/data/dictionary.py", line 214, in load
    d.add_from_file(f)
  File "<path>/Python/3.8/lib/python/site-packages/fairseq/data/dictionary.py", line 227, in add_from_file
    raise fnfe
  File "<path>/Python/3.8/lib/python/site-packages/fairseq/data/dictionary.py", line 224, in add_from_file
    with open(PathManager.get_local_path(f), "r", encoding="utf-8") as fd:
FileNotFoundError: [Errno 2] No such file or directory: '<path>/m2m100_original/dict.af.txt'

We also trying to convert M2M-100 1.2B with “ct2-fairseq-converte”

https://dl.fbaipublicfiles.com/m2m_100/1.2B_last_checkpoint.pt

and get error:

✗ aws-fb-test ~ $ ct2-fairseq-converter --model_path /root/fairseq/1.2B_last_checkpoint.pt --data_dir /root/fairseq/ --output_dir /tmp/out --force
Traceback (most recent call last):
File “/root/.pyenv/versions/3.8.1/bin/ct2-fairseq-converter”, line 8, in
sys.exit(main())
File “/root/.pyenv/versions/3.8.1/lib/python3.8/site-packages/ctranslate2/bin/fairseq_converter.py”, line 18, in main
converters.FairseqConverter(args.model_path, args.data_dir).convert_from_args(args)
File “/root/.pyenv/versions/3.8.1/lib/python3.8/site-packages/ctranslate2/converters/converter.py”, line 31, in convert_from_args
return self.convert(
File “/root/.pyenv/versions/3.8.1/lib/python3.8/site-packages/ctranslate2/converters/converter.py”, line 45, in convert
model_spec = self._load()
File “/root/.pyenv/versions/3.8.1/lib/python3.8/site-packages/ctranslate2/converters/fairseq.py”, line 92, in _load
task = fairseq.tasks.setup_task(args)
File “/root/fairseq/fairseq/tasks/init.py”, line 44, in setup_task
return task.setup_task(cfg, **kwargs)
File “/root/fairseq/fairseq/tasks/translation_multi_simple_epoch.py”, line 125, in setup_task
langs, dicts, training = MultilingualDatasetManager.prepare(
File “/root/fairseq/fairseq/data/multilingual/multilingual_data_manager.py”, line 311, in prepare
if args.langtoks is None:
AttributeError: ‘Namespace’ object has no attribute ‘langtoks’

I updated the converter to support M2M models:

  • For conversion, the path to the single vocabulary file should be passed to the fixed_dictionary option (same name as the Fairseq option).
  • For translation, the language tags should be included in the input like this:
translator.translate_batch(
    [["__en__", "▁Hello", "▁World", "!"], ["__en__", "▁Hello", "▁World", "!"]],
    target_prefix=[["__fr__"], ["__es__"]],
)

This example translates the same English sentence in French and Spanish.

I tested the 418M model and it seems to work fine, but if you can verify on your side that would be great!

1.2B converted fine.

But the biggest, 12B model (12b_last_chk_4_gpus.pt) gives conversion error. It’s because of shared across several GPU’s ?

Traceback (most recent call last):
File “/root/.local/share/virtualenvs/r10-MnTosGMW/bin/my-convert”, line 13, in
converter.convert(output_dir)
File “/root/.local/share/virtualenvs/r10-MnTosGMW/lib/python3.8/site-packages/ctranslate2/converters/converter.py”, line 45, in convert
model_spec = self._load()
File “/root/.local/share/virtualenvs/r10-MnTosGMW/lib/python3.8/site-packages/ctranslate2/converters/fairseq.py”, line 94, in _load
model_spec = _get_model_spec(args)
File “/root/.local/share/virtualenvs/r10-MnTosGMW/lib/python3.8/site-packages/ctranslate2/converters/fairseq.py”, line 61, in _get_model_spec
utils.raise_unsupported(reasons)
File “/root/.local/share/virtualenvs/r10-MnTosGMW/lib/python3.8/site-packages/ctranslate2/converters/utils.py”, line 16, in raise_unsupported
raise ValueError(message)
ValueError: The model you are trying to convert is not supported by CTranslate2. We identified the following reasons:

  • Option --arch transformer_wmt_en_de_big_pipeline_parallel is not supported (supported architectures are: transformer_wmt_en_de_big, transformer_tiny, transformer_vaswani_wmt_en_fr_big, transformer_wmt_en_de, transformer, transformer_vaswani_wmt_en_de_big, transformer_iwslt_de_en, transformer_wmt_en_de_big_t2t)

Yes, the 12B version is using a different model architecture in order to distribute it on several GPUs.

For now it is not supported, but I will check how CTranslate2 is handling these gigantic models. A 48GB model is multiple times bigger than what we ever tested.

is it possible to convert 12B model architecture to single GPU ?

I think the quantized 12B model would barely run on a 16GB GPU. But before that there would be other issues to address, for example the converter currently requires 2 times the model size in memory to run.