Error: Unable to convert model from OpenNMT-py to CTranslate2

Hello. I am a beginner with OpenNMT, and I am trying to convert my self-made OpenNMT-py model to a CTranslate2 model.

However, I am encountering an error as follows. When creating the OpenNMT-py model, I set “self_attn_type” to “scaled-dot”, and also my config.yaml is created referring to GitHub - ymoslem/OpenNMT-Tutorial: Neural Machine Translation (NMT) tutorial. Data preprocessing, model training, evaluation, and deployment..

Could anyone please advise me on how to solve this problem? Thank you very much in advance.

Command

ct2-opennmt-py-converter --model_path model.ensl_step_14000.pt  --output_dir enslo_ctranslate

Error message

"ValueError: The model you are trying to convert is not supported by CTranslate2. We identified the following reasons:

- Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot)"

config.yaml

save_data: run

data:
    corpus_1:
        path_src: Moodle.en-sl.en-filtered.en.subword.train
        path_tgt: Moodle.en-sl.sl-filtered.sl.subword.train
        transforms: [filtertoolong]
        weight: 2

    valid:
        path_src: Moodle.en-sl.en-filtered.en.subword.dev
        path_tgt: Moodle.en-sl.sl-filtered.sl.subword.dev
        transforms: [filtertoolong]
        weight: 2

train_from: "model_back/model.ensl_step2_11000.pt"
update_vocab: true
reset_optim: "states"
self_attn_type: "scaled-dot"

src_vocab: run/source.vocab
tgt_vocab: run/target.vocab

src_vocab_size: 50000
tgt_vocab_size: 50000

src_seq_length: 150
tgt_seq_length: 150

src_subword_model: source.model
tgt_subword_model: target.model

log_file: train.log
save_model: models/model.ensl

early_stopping: 2

save_checkpoint_steps: 1000

seed: 3435

train_steps: 14000

valid_steps: 1000

warmup_steps: 500
report_every: 100

world_size: 1
gpu_ranks: [0]

bucket_size: 262144
num_workers: 0
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 2048
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]

model_dtype: "fp16"
optim: "adam"
learning_rate: 0.3
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"

encoder_type: transformer
decoder_type: transformer
position_encoding: true
enc_layers: 6
dec_layers: 6
heads: 8
hidden_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0, 11000, 12000, 13000]
dropout: [0.3, 0.5, 0.4, 0.4]
attention_dropout: [0.1, 0.1, 0.1, 0.1]
1 Like

it’s a bug.
I’ll need to fix it.
if you want to by pass just modify here:

by:
self_attn_type = “scaled-dot”

2 Likes

Hello Vincent,

Thank you very much for your prompt reply, and I will correct the code as you instructed.

Thank you very much once again, and I hope you have a great weekend!

Hello. I am getting the same error when trying to convert a transform_lm model generated according to wiki_103 example in OpenNMT documentation (OpenNMT-py/docs/source/examples/wiki_103/LanguageModelGeneration.md at master · OpenNMT/OpenNMT-py · GitHub).

configyaml is

num_workers: 0

seed: 42
share_vocab: true
save_data: data/wikitext-103-raw/run/example

Where the vocab(s) will be written

src_vocab: data/wikitext-103-raw/run/example.vocab.src
src_vocab_size: 60000
tgt_vocab_size: 60000
src_subword_type: bpe
src_subword_model: data/wikitext-103-raw/subwords.bpe
src_onmttok_kwargs: ‘{“mode”: “aggressive”, “joiner_annotate”: True, “preserve_placeholders”:
True, “case_markup”: True, “soft_case_regions”: True, “preserve_segmented_tokens”:
True}’
transforms: [onmt_tokenize, filtertoolong]
src_seq_length: 512
tgt_seq_length: 512

Prevent overwriting existing files in the folder

overwrite: True

Corpus opts:

data:
corpus_1:
path_src: data/wikitext-103-raw/wiki.train.raw
valid:
path_src: data/wikitext-103-raw/wiki.valid.raw

Vocabulary files that were just created

src_vocab: data/wikitext-103-raw/run/example.vocab.src

Train on a single GPU

world_size: 1
gpu_ranks: [0]

Where to save the checkpoints

save_model: data/wikitext-103-raw/run/model-lm
save_checkpoint_steps: 1000 #500 #50000
train_from: data/wikitext-103-raw/run/model-lm_step_101000.pt
train_steps: 1000000
valid_steps: 500
report_every: 100
tensorboard: true
tensorboard_log_dir: data/wikitext-103-raw/run/tensorboard

Model

model_task: lm
encoder_type: transformer_lm
decoder_type: transformer_lm
position_encoding: true
dec_layers: 6
heads: 8
hidden_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]
batch_size: 2048
batch_type: tokens

model_dtype: “fp32”
optim: “adam”
learning_rate: 2
warmup_steps: 8000
decay_method: “noam”
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: “tokens”

When converting it to ctranslate I get the same error as above:

ct2-opennmt-py-converter --model_path data/wikitext-103-raw/run/model-lm_step_108000.pt --output_dir ct2_model

ValueError: The model you are trying to convert is not supported by CTranslate2. We identified the following reasons:

  • Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot)

Any hint to overcome the conversion error?

Thank you so much,

Giuliano Lancioni

1 Like

same error same fix, read above please my answer

The problem is that I am using the latest Docker image under WSL2 and when I try to recompile from source I get an error of libiomp5 library not found (which I think is due to CUDA libraries being precompiled), which I couldn’t overcome.

Strangely enough, installed python source in the Docker image already seem to be correct according to your suggestion.

Could you kindly advise me on how to fix it when using the Docker image?

Thank you again,

Giuliano

Actually, the master branch

look to be identical to the change you suggested. It isn’t clear what it is expected to change there.

def check_opt(opt, num_source_embeddings):
with_relative_position = getattr(opt, “max_relative_positions”, 0) > 0
with_rotary = getattr(opt, “max_relative_positions”, 0) == -1
with_alibi = getattr(opt, “max_relative_positions”, 0) == -2
activation_fn = getattr(opt, “pos_ffn_activation_fn”, “relu”)
feat_merge = getattr(opt, “feat_merge”, “concat”)
self_attn_type = getattr(opt, “self_attn_type”, “scaled-dot”)

check = utils.ConfigurationChecker()
check(
    opt.encoder_type == opt.decoder_type
    and opt.decoder_type in {"transformer", "transformer_lm"},
    "Options --encoder_type and --decoder_type must be"
    " 'transformer' or 'transformer_lm",
)
check(
    self_attn_type == "scaled-dot",
    "Option --self_attn_type %s is not supported (supported values are: scaled-dot)"
    % self_attn_type,
)
check(
    activation_fn in _SUPPORTED_ACTIVATIONS,
    "Option --pos_ffn_activation_fn %s is not supported (supported activations are: %s)"
    % (activation_fn, ", ".join(_SUPPORTED_ACTIVATIONS.keys())),
)
check(
    opt.position_encoding != (with_relative_position or with_rotary or with_alibi),
    "Options --position_encoding and --max_relative_positions cannot be both enabled "
    "or both disabled",
)
check(
    num_source_embeddings == 1 or feat_merge in _SUPPORTED_FEATURES_MERGE,
    "Option --feat_merge %s is not supported (supported merge modes are: %s)"
    % (feat_merge, " ".join(_SUPPORTED_FEATURES_MERGE.keys())),
)
check.validate()

my messages says
replace this
self_attn_type = getattr(opt, “self_attn_type”, “scaled-dot”)
by this:
self_attn_type =scaled-dot

this is just a hack I will make a PR next week.

Very well, thank you!

In my specific case it sufficed to load the .pt file with torch.load, to change the model[‘opt’].self_attn_type to ‘scaled-dot’ (the configuration had ‘scaled-dot-flash’ by default) and to save it again with torch.save. It worked as a workaround.

And I managed to patch the Docker image. Thanks again.