Error: Unable to convert model from OpenNMT-py to CTranslate2

korino · March 2, 2024, 8:04pm

Hello. I am a beginner with OpenNMT, and I am trying to convert my self-made OpenNMT-py model to a CTranslate2 model.

However, I am encountering an error as follows. When creating the OpenNMT-py model, I set “self_attn_type” to “scaled-dot”, and also my config.yaml is created referring to GitHub - ymoslem/OpenNMT-Tutorial: Neural Machine Translation (NMT) tutorial. Data preprocessing, model training, evaluation, and deployment..

Could anyone please advise me on how to solve this problem? Thank you very much in advance.

Command

ct2-opennmt-py-converter --model_path model.ensl_step_14000.pt  --output_dir enslo_ctranslate

Error message

"ValueError: The model you are trying to convert is not supported by CTranslate2. We identified the following reasons:

- Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot)"

config.yaml

save_data: run

data:
    corpus_1:
        path_src: Moodle.en-sl.en-filtered.en.subword.train
        path_tgt: Moodle.en-sl.sl-filtered.sl.subword.train
        transforms: [filtertoolong]
        weight: 2

    valid:
        path_src: Moodle.en-sl.en-filtered.en.subword.dev
        path_tgt: Moodle.en-sl.sl-filtered.sl.subword.dev
        transforms: [filtertoolong]
        weight: 2

train_from: "model_back/model.ensl_step2_11000.pt"
update_vocab: true
reset_optim: "states"
self_attn_type: "scaled-dot"

src_vocab: run/source.vocab
tgt_vocab: run/target.vocab

src_vocab_size: 50000
tgt_vocab_size: 50000

src_seq_length: 150
tgt_seq_length: 150

src_subword_model: source.model
tgt_subword_model: target.model

log_file: train.log
save_model: models/model.ensl

early_stopping: 2

save_checkpoint_steps: 1000

seed: 3435

train_steps: 14000

valid_steps: 1000

warmup_steps: 500
report_every: 100

world_size: 1
gpu_ranks: [0]

bucket_size: 262144
num_workers: 0
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 2048
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]

model_dtype: "fp16"
optim: "adam"
learning_rate: 0.3
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"

encoder_type: transformer
decoder_type: transformer
position_encoding: true
enc_layers: 6
dec_layers: 6
heads: 8
hidden_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0, 11000, 12000, 13000]
dropout: [0.3, 0.5, 0.4, 0.4]
attention_dropout: [0.1, 0.1, 0.1, 0.1]

vince62s · March 2, 2024, 8:30pm

it’s a bug.
I’ll need to fix it.
if you want to by pass just modify here:

github.com

OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/opennmt_py.py#L26


      
              "sum": common_spec.EmbeddingsMerge.ADD,
          }
          
          
          def check_opt(opt, num_source_embeddings):
              with_relative_position = getattr(opt, "max_relative_positions", 0) > 0
              with_rotary = getattr(opt, "max_relative_positions", 0) == -1
              with_alibi = getattr(opt, "max_relative_positions", 0) == -2
              activation_fn = getattr(opt, "pos_ffn_activation_fn", "relu")
              feat_merge = getattr(opt, "feat_merge", "concat")
              self_attn_type = getattr(opt, "self_attn_type", "scaled-dot")
          
              check = utils.ConfigurationChecker()
              check(
                  opt.encoder_type == opt.decoder_type
                  and opt.decoder_type in {"transformer", "transformer_lm"},
                  "Options --encoder_type and --decoder_type must be"
                  " 'transformer' or 'transformer_lm",
              )
              check(
                  self_attn_type == "scaled-dot",

by:
self_attn_type = “scaled-dot”

korino · March 2, 2024, 8:37pm

Hello Vincent,

Thank you very much for your prompt reply, and I will correct the code as you instructed.

Thank you very much once again, and I hope you have a great weekend!

lancioni · March 24, 2024, 8:39am

Hello. I am getting the same error when trying to convert a transform_lm model generated according to wiki_103 example in OpenNMT documentation (OpenNMT-py/docs/source/examples/wiki_103/LanguageModelGeneration.md at master · OpenNMT/OpenNMT-py · GitHub).

configyaml is

num_workers: 0

seed: 42
share_vocab: true
save_data: data/wikitext-103-raw/run/example

Where the vocab(s) will be written

src_vocab: data/wikitext-103-raw/run/example.vocab.src
src_vocab_size: 60000
tgt_vocab_size: 60000
src_subword_type: bpe
src_subword_model: data/wikitext-103-raw/subwords.bpe
src_onmttok_kwargs: ‘{“mode”: “aggressive”, “joiner_annotate”: True, “preserve_placeholders”:
True, “case_markup”: True, “soft_case_regions”: True, “preserve_segmented_tokens”:
True}’
transforms: [onmt_tokenize, filtertoolong]
src_seq_length: 512
tgt_seq_length: 512

Prevent overwriting existing files in the folder

overwrite: True

Corpus opts:

data:
corpus_1:
path_src: data/wikitext-103-raw/wiki.train.raw
valid:
path_src: data/wikitext-103-raw/wiki.valid.raw

Vocabulary files that were just created

src_vocab: data/wikitext-103-raw/run/example.vocab.src

Train on a single GPU

world_size: 1
gpu_ranks: [0]

Where to save the checkpoints

save_model: data/wikitext-103-raw/run/model-lm
save_checkpoint_steps: 1000 #500 #50000
train_from: data/wikitext-103-raw/run/model-lm_step_101000.pt
train_steps: 1000000
valid_steps: 500
report_every: 100
tensorboard: true
tensorboard_log_dir: data/wikitext-103-raw/run/tensorboard

Model

model_task: lm
encoder_type: transformer_lm
decoder_type: transformer_lm
position_encoding: true
dec_layers: 6
heads: 8
hidden_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]
batch_size: 2048
batch_type: tokens

model_dtype: “fp32”
optim: “adam”
learning_rate: 2
warmup_steps: 8000
decay_method: “noam”
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: “tokens”

When converting it to ctranslate I get the same error as above:

ct2-opennmt-py-converter --model_path data/wikitext-103-raw/run/model-lm_step_108000.pt --output_dir ct2_model

ValueError: The model you are trying to convert is not supported by CTranslate2. We identified the following reasons:

Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot)

Any hint to overcome the conversion error?

Thank you so much,

Giuliano Lancioni

vince62s · March 24, 2024, 8:48am

same error same fix, read above please my answer

lancioni · March 24, 2024, 9:35am

The problem is that I am using the latest Docker image under WSL2 and when I try to recompile from source I get an error of libiomp5 library not found (which I think is due to CUDA libraries being precompiled), which I couldn’t overcome.

Strangely enough, installed python source in the Docker image already seem to be correct according to your suggestion.

Could you kindly advise me on how to fix it when using the Docker image?

Thank you again,

Giuliano

lancioni · March 24, 2024, 11:00am

Actually, the master branch

github.com

OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/opennmt_py.py

import argparse

from ctranslate2.converters import utils
from ctranslate2.converters.converter import Converter
from ctranslate2.specs import common_spec, transformer_spec

_SUPPORTED_ACTIVATIONS = {
    "gelu": common_spec.Activation.GELU,
    "fast_gelu": common_spec.Activation.GELUTanh,
    "relu": common_spec.Activation.RELU,
    "silu": common_spec.Activation.SWISH,
}

_SUPPORTED_FEATURES_MERGE = {
    "concat": common_spec.EmbeddingsMerge.CONCAT,
    "sum": common_spec.EmbeddingsMerge.ADD,
}


def check_opt(opt, num_source_embeddings):

This file has been truncated. show original

look to be identical to the change you suggested. It isn’t clear what it is expected to change there.

def check_opt(opt, num_source_embeddings):
with_relative_position = getattr(opt, “max_relative_positions”, 0) > 0
with_rotary = getattr(opt, “max_relative_positions”, 0) == -1
with_alibi = getattr(opt, “max_relative_positions”, 0) == -2
activation_fn = getattr(opt, “pos_ffn_activation_fn”, “relu”)
feat_merge = getattr(opt, “feat_merge”, “concat”)
self_attn_type = getattr(opt, “self_attn_type”, “scaled-dot”)

check = utils.ConfigurationChecker()
check(
    opt.encoder_type == opt.decoder_type
    and opt.decoder_type in {"transformer", "transformer_lm"},
    "Options --encoder_type and --decoder_type must be"
    " 'transformer' or 'transformer_lm",
)
check(
    self_attn_type == "scaled-dot",
    "Option --self_attn_type %s is not supported (supported values are: scaled-dot)"
    % self_attn_type,
)
check(
    activation_fn in _SUPPORTED_ACTIVATIONS,
    "Option --pos_ffn_activation_fn %s is not supported (supported activations are: %s)"
    % (activation_fn, ", ".join(_SUPPORTED_ACTIVATIONS.keys())),
)
check(
    opt.position_encoding != (with_relative_position or with_rotary or with_alibi),
    "Options --position_encoding and --max_relative_positions cannot be both enabled "
    "or both disabled",
)
check(
    num_source_embeddings == 1 or feat_merge in _SUPPORTED_FEATURES_MERGE,
    "Option --feat_merge %s is not supported (supported merge modes are: %s)"
    % (feat_merge, " ".join(_SUPPORTED_FEATURES_MERGE.keys())),
)
check.validate()

vince62s · March 24, 2024, 12:24pm

my messages says
replace this
self_attn_type = getattr(opt, “self_attn_type”, “scaled-dot”)
by this:
self_attn_type =scaled-dot

this is just a hack I will make a PR next week.

lancioni · March 24, 2024, 2:12pm

Very well, thank you!

lancioni · March 24, 2024, 2:17pm

In my specific case it sufficed to load the .pt file with torch.load, to change the model[‘opt’].self_attn_type to ‘scaled-dot’ (the configuration had ‘scaled-dot-flash’ by default) and to save it again with torch.save. It worked as a workaround.

lancioni · March 24, 2024, 3:13pm

And I managed to patch the Docker image. Thanks again.

MaRauder1111 · November 10, 2024, 5:01am

How to fix this error when we install and use the Ctranslate2 from pip?