Opennmt-py converter to ctranslate2, the model generating text have problems

the ctranslate2 model generating text are repeated no matter what I translate, but pytorch model is normal

Please provide more information:

  • the model you trained (post the training command line)
  • how you converted the model (with onmt_release_model or other?)
  • how you run the translation, what is the input
  1. onmt_train -config=en-th-train.yaml
  2. ct2-opennmt-py-converter --model_path --output_dir ctranslate2-model-int8 --model_spec TransformerBig --quantization int8
  3. translator.translate.batch( [[‘LANG_TOK_EN’, ‘h@@’, ‘app@@’, ‘y’, ‘bir@@’, ‘th@@’, ‘day’]], beam_size=2, target_prefix=[[‘LANG_TOK_TH’]])

we will specify target_prefix when generate text, it is the same process when we train model

What is the content of this file?


Where the samples will be written

save_data: /home/work/user-job-dir/fairseq-080/onmt_test1/example

Where the vocab(s) will be written

src_vocab: /home/work/user-job-dir/opennmt-py/data/vocab.bpe.32000_gai
tgt_vocab: /home/work/user-job-dir/opennmt-py/data/vocab.bpe.32000_gai

Prevent overwriting existing files in the folder

overwrite: False

Corpus opts:

path_src: /home/work/user-job-dir/opennmt-py/data/en-th/
path_tgt: /home/work/user-job-dir/opennmt-py/data/en-th/
path_src: /home/work/user-job-dir/opennmt-py/data/en-th/
path_tgt: /home/work/user-job-dir/opennmt-py/data/en-th/

Train on a single GPU

#world_size: 1
#gpu_ranks: [0]

#src_subword_model: /home/work/user-job-dir/opennmt-py/data/codes.bpe.32000
#tgt_subword_model: /home/work/user-job-dir/opennmt-py/data/codes.bpe.32000
#src_subword_type: bpe
#tgt_subword_type: bpe
share_vocab: true
src_vocab_size: 64867
tgt_vocab_size: 64867

Where to save the checkpoints

save_model: /home/work/user-job-dir/opennmt-py/output/en-th
save_checkpoint_steps: 20000
train_steps: 60000
valid_steps: 2000

#decoder_type: transformer
#encoder_type: transformer

#layers: 6

General opts

#save_model: foo
#save_checkpoint_steps: 10000
#valid_steps: 10000
#train_steps: 200000


queue_size: 10000
bucket_size: 32768
world_size: 8
gpu_ranks: [0, 1, 2, 3, 4, 5, 6, 7]
batch_type: “tokens”
batch_size: 4096
valid_batch_size: 8
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]

train_from: /home/work/user-job-dir/opennmt-py/model/pretrain/
reset_optim: all


model_dtype: “fp32”
optim: “adam”
learning_rate: 0.5 #was 0.001 # was 1.0 0.5
warmup_steps: 4000
decay_method: “noam”
adam_beta1: 0.9
adam_beta2: 0.98
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: “tokens”


encoder_type: transformer
decoder_type: transformer
position_encoding: true # ? ‘no_token_positional_embeddings’, False
enc_layers: 6 #0
dec_layers: 6 #0
heads: 16 # ? ‘encoder_attention_heads’, 16; ‘decoder_attention_heads’, 16
rnn_size: 1024
word_vec_size: 1024
transformer_ff: 4096 # ? args, ‘encoder_ffn_embed_dim’, 4096; ‘decoder_ffn_embed_dim’, 4096
dropout_steps: [0]
dropout: [0.1] # args, ‘dropout’, 0.2 #was 0.2
attention_dropout: [0.1] #0

pos_ffn_activation_fn: “gelu”
share_embeddings: true
share_decoder_embeddings: true
src_seq_length: 300
tgt_seq_length: 300

src_seq_length_trunc: 300
tgt_seq_length_trunc: 300

Unfortunately this activation is not used in the original Transformer architecture. So CTranslate2 uses the ReLU activation by default.

We could easily support GELU during conversion, but did you get better results with GELU vs. ReLU?

In fact, we haven’t compared it; Gelu is mentioned in the paper (Pre-trainingMultilingualNeuralMachineTranslationbyLeveraging AlignmentInformation

Is there any way I can change the activation function to gelu when converter model

No, there is no such option at the moment. If you want to use CTranslate2, you should train a standard Transformer model (so with ReLU activation instead of GELU).


For reference, this PR is adding support for converting models with GELU activations:

1 Like

Hi @guillaumekln ,

Are there any thoughts to support Swish in OpenNMT-tf/py? It is reported that it performs well for large transformers.


If you are using OpenNMT-tf, you can already customize the activation function when defining a custom model definition. See the ffn_activation argument in the Transformer model. The Swish activation is available in TensorFlow as tf.nn.silu.

It can be added later to CTranslate2, if needed.


Great, thanks!