Opennmt-py converter to ctranslate2, the model generating text have problems

lsjiiia · April 28, 2021, 9:25am

the ctranslate2 model generating text are repeated no matter what I translate, but pytorch model is normal

guillaumekln · April 28, 2021, 10:02am

Please provide more information:

the model you trained (post the training command line)
how you converted the model (with onmt_release_model or other?)
how you run the translation, what is the input

lsjiiia · April 28, 2021, 10:13am

onmt_train -config=en-th-train.yaml
ct2-opennmt-py-converter --model_path en-th_step_60000.pt --output_dir ctranslate2-model-int8 --model_spec TransformerBig --quantization int8
translator.translate.batch( [[‘LANG_TOK_EN’, ‘h@@’, ‘app@@’, ‘y’, ‘bir@@’, ‘th@@’, ‘day’]], beam_size=2, target_prefix=[[‘LANG_TOK_TH’]])

lsjiiia · April 28, 2021, 10:15am

we will specify target_prefix when generate text, it is the same process when we train model

guillaumekln · April 28, 2021, 10:35am

What is the content of this file?

lsjiiia · April 28, 2021, 10:56am

toy_en_de.yaml

Where the samples will be written

save_data: /home/work/user-job-dir/fairseq-080/onmt_test1/example

Where the vocab(s) will be written

src_vocab: /home/work/user-job-dir/opennmt-py/data/vocab.bpe.32000_gai
tgt_vocab: /home/work/user-job-dir/opennmt-py/data/vocab.bpe.32000_gai

Prevent overwriting existing files in the folder

overwrite: False

Corpus opts:

data:
corpus_1:
path_src: /home/work/user-job-dir/opennmt-py/data/en-th/train.tok.bpe.mrasp.new.en
path_tgt: /home/work/user-job-dir/opennmt-py/data/en-th/train.tok.bpe.mrasp.new.th
valid:
path_src: /home/work/user-job-dir/opennmt-py/data/en-th/eval.tok.bpe.mrasp.new.en
path_tgt: /home/work/user-job-dir/opennmt-py/data/en-th/eval.tok.bpe.mrasp.new.th

Train on a single GPU

#world_size: 1
#gpu_ranks: [0]

#src_subword_model: /home/work/user-job-dir/opennmt-py/data/codes.bpe.32000
#tgt_subword_model: /home/work/user-job-dir/opennmt-py/data/codes.bpe.32000
#src_subword_type: bpe
#tgt_subword_type: bpe
share_vocab: true
src_vocab_size: 64867
tgt_vocab_size: 64867

Where to save the checkpoints

save_model: /home/work/user-job-dir/opennmt-py/output/en-th
save_checkpoint_steps: 20000
train_steps: 60000
valid_steps: 2000

#decoder_type: transformer
#encoder_type: transformer

#layers: 6

General opts

#save_model: foo
#save_checkpoint_steps: 10000
#valid_steps: 10000
#train_steps: 200000

Batching

queue_size: 10000
bucket_size: 32768
world_size: 8
gpu_ranks: [0, 1, 2, 3, 4, 5, 6, 7]
batch_type: “tokens”
batch_size: 4096
valid_batch_size: 8
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]

train_from: /home/work/user-job-dir/opennmt-py/model/pretrain/model_step_50000_c1.pt
reset_optim: all

Optimization

model_dtype: “fp32”
optim: “adam”
learning_rate: 0.5 #was 0.001 # was 1.0 0.5
warmup_steps: 4000
decay_method: “noam”
adam_beta1: 0.9
adam_beta2: 0.98
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: “tokens”

Model

encoder_type: transformer
decoder_type: transformer
position_encoding: true # ? ‘no_token_positional_embeddings’, False
enc_layers: 6 #0
dec_layers: 6 #0
heads: 16 # ? ‘encoder_attention_heads’, 16; ‘decoder_attention_heads’, 16
rnn_size: 1024
word_vec_size: 1024
transformer_ff: 4096 # ? args, ‘encoder_ffn_embed_dim’, 4096; ‘decoder_ffn_embed_dim’, 4096
dropout_steps: [0]
dropout: [0.1] # args, ‘dropout’, 0.2 #was 0.2
attention_dropout: [0.1] #0

pos_ffn_activation_fn: “gelu”
share_embeddings: true
share_decoder_embeddings: true
src_seq_length: 300
tgt_seq_length: 300

src_seq_length_trunc: 300
tgt_seq_length_trunc: 300

guillaumekln · April 28, 2021, 11:00am

Unfortunately this activation is not used in the original Transformer architecture. So CTranslate2 uses the ReLU activation by default.

We could easily support GELU during conversion, but did you get better results with GELU vs. ReLU?

lsjiiia · April 28, 2021, 11:08am

In fact, we haven’t compared it; Gelu is mentioned in the paper (Pre-trainingMultilingualNeuralMachineTranslationbyLeveraging AlignmentInformation
).

lsjiiia · April 28, 2021, 1:07pm

Is there any way I can change the activation function to gelu when converter model

guillaumekln · April 28, 2021, 1:11pm

No, there is no such option at the moment. If you want to use CTranslate2, you should train a standard Transformer model (so with ReLU activation instead of GELU).

lsjiiia · April 28, 2021, 1:14pm

okay，thanks

guillaumekln · June 10, 2021, 9:34am

For reference, this PR is adding support for converting models with GELU activations:

panosk · June 11, 2021, 8:19am

Hi @guillaumekln ,

Are there any thoughts to support Swish in OpenNMT-tf/py? It is reported that it performs well for large transformers.

guillaumekln · June 11, 2021, 8:26am

Hi,

If you are using OpenNMT-tf, you can already customize the activation function when defining a custom model definition. See the ffn_activation argument in the Transformer model. The Swish activation is available in TensorFlow as tf.nn.silu.

It can be added later to CTranslate2, if needed.

panosk · June 11, 2021, 8:36am

Great, thanks!