I have trained a few many-to-one models where 6 languages are translated into English (different source & target sentences across languages). During inference, my models keep repeating one or two sentences in the output for all the sentences. Something like this:
I ' m going to tell you what I ' m going to do .
I ' m going to tell you what I ' m going to do .
I ' m going to tell you what I ' m going to do .
I ' m not going to go to the hos pital .
" I ' m not going to go to school today .
I ' m going to tell you what I ' m going to do .
I ' m going to tell you what I ' m going to do .
I ' m not going to go to the hos pital .
I ' m going to tell you what I ' m going to do .
I ' m going to tell you what I ' m doing right now .
I ' m going to tell you what I ' m going to do .
" I ' m not going to go to school today .
I ' m going to tell you what I ' m going to do .
I ' m going to tell you what I ' m going to do .
I ' m going to tell you what I ' m going to do .
Previously, I trained other models with different hyper-parameters and noticed the same issue. The following is the configuration based on which the models are trained:
save_checkpoint_steps: 5000
seed: 3435
train_steps: 200000
valid_steps: 10000
warmup_steps: 6000
report_every: 100
decoder_type: transformer
encoder_type: transformer
word_vec_size: 512
hidden_size: 512
layers: 6
transformer_ff: 2048
heads: 8
model_dtype: "fp16"
accum_count: 8
optim: adam
adam_beta1: 0.8
adam_beta2: 0.998
decay_method: noam
learning_rate: 1.0
max_grad_norm: 0.0
batch_size: 4096
valid_batch_size: 4096
batch_type: tokens
normalization: tokens
dropout: 0.1
label_smoothing: 0.1
param_init: 0.0
param_init_glorot: 'true'
position_encoding: 'true'
world_size: 1
gpu_ranks: [0]
My training corpus contains 3.6 million sentences. I must also say that neither -min_length
nor -block_ngram_repeat
helped fixing this problem.