the ctranslate2 model generating text are repeated no matter what I translate, but pytorch model is normal
Please provide more information:
- the model you trained (post the training command line)
- how you converted the model (with
onmt_release_model
or other?) - how you run the translation, what is the input
- onmt_train -config=en-th-train.yaml
- ct2-opennmt-py-converter --model_path en-th_step_60000.pt --output_dir ctranslate2-model-int8 --model_spec TransformerBig --quantization int8
- translator.translate.batch( [[‘LANG_TOK_EN’, ‘h@@’, ‘app@@’, ‘y’, ‘bir@@’, ‘th@@’, ‘day’]], beam_size=2, target_prefix=[[‘LANG_TOK_TH’]])
we will specify target_prefix when generate text, it is the same process when we train model
What is the content of this file?
toy_en_de.yaml
Where the samples will be written
save_data: /home/work/user-job-dir/fairseq-080/onmt_test1/example
Where the vocab(s) will be written
src_vocab: /home/work/user-job-dir/opennmt-py/data/vocab.bpe.32000_gai
tgt_vocab: /home/work/user-job-dir/opennmt-py/data/vocab.bpe.32000_gai
Prevent overwriting existing files in the folder
overwrite: False
Corpus opts:
data:
corpus_1:
path_src: /home/work/user-job-dir/opennmt-py/data/en-th/train.tok.bpe.mrasp.new.en
path_tgt: /home/work/user-job-dir/opennmt-py/data/en-th/train.tok.bpe.mrasp.new.th
valid:
path_src: /home/work/user-job-dir/opennmt-py/data/en-th/eval.tok.bpe.mrasp.new.en
path_tgt: /home/work/user-job-dir/opennmt-py/data/en-th/eval.tok.bpe.mrasp.new.th
Train on a single GPU
#world_size: 1
#gpu_ranks: [0]
#src_subword_model: /home/work/user-job-dir/opennmt-py/data/codes.bpe.32000
#tgt_subword_model: /home/work/user-job-dir/opennmt-py/data/codes.bpe.32000
#src_subword_type: bpe
#tgt_subword_type: bpe
share_vocab: true
src_vocab_size: 64867
tgt_vocab_size: 64867
Where to save the checkpoints
save_model: /home/work/user-job-dir/opennmt-py/output/en-th
save_checkpoint_steps: 20000
train_steps: 60000
valid_steps: 2000
#decoder_type: transformer
#encoder_type: transformer
#layers: 6
General opts
#save_model: foo
#save_checkpoint_steps: 10000
#valid_steps: 10000
#train_steps: 200000
Batching
queue_size: 10000
bucket_size: 32768
world_size: 8
gpu_ranks: [0, 1, 2, 3, 4, 5, 6, 7]
batch_type: “tokens”
batch_size: 4096
valid_batch_size: 8
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]
train_from: /home/work/user-job-dir/opennmt-py/model/pretrain/model_step_50000_c1.pt
reset_optim: all
Optimization
model_dtype: “fp32”
optim: “adam”
learning_rate: 0.5 #was 0.001 # was 1.0 0.5
warmup_steps: 4000
decay_method: “noam”
adam_beta1: 0.9
adam_beta2: 0.98
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: “tokens”
Model
encoder_type: transformer
decoder_type: transformer
position_encoding: true # ? ‘no_token_positional_embeddings’, False
enc_layers: 6 #0
dec_layers: 6 #0
heads: 16 # ? ‘encoder_attention_heads’, 16; ‘decoder_attention_heads’, 16
rnn_size: 1024
word_vec_size: 1024
transformer_ff: 4096 # ? args, ‘encoder_ffn_embed_dim’, 4096; ‘decoder_ffn_embed_dim’, 4096
dropout_steps: [0]
dropout: [0.1] # args, ‘dropout’, 0.2 #was 0.2
attention_dropout: [0.1] #0
pos_ffn_activation_fn: “gelu”
share_embeddings: true
share_decoder_embeddings: true
src_seq_length: 300
tgt_seq_length: 300
src_seq_length_trunc: 300
tgt_seq_length_trunc: 300
Unfortunately this activation is not used in the original Transformer architecture. So CTranslate2 uses the ReLU activation by default.
We could easily support GELU during conversion, but did you get better results with GELU vs. ReLU?
In fact, we haven’t compared it; Gelu is mentioned in the paper (Pre-trainingMultilingualNeuralMachineTranslationbyLeveraging AlignmentInformation
).
Is there any way I can change the activation function to gelu when converter model
No, there is no such option at the moment. If you want to use CTranslate2, you should train a standard Transformer model (so with ReLU activation instead of GELU).
okay,thanks
For reference, this PR is adding support for converting models with GELU activations:
Hi @guillaumekln ,
Are there any thoughts to support Swish in OpenNMT-tf/py? It is reported that it performs well for large transformers.
Hi,
If you are using OpenNMT-tf, you can already customize the activation function when defining a custom model definition. See the ffn_activation
argument in the Transformer model. The Swish activation is available in TensorFlow as tf.nn.silu
.
It can be added later to CTranslate2, if needed.
Great, thanks!