Low BLEU score for translation tutorial

Hello everyone.

I’m following the translation tutorial available in the docs. First i’ve tryed using the same configuration as in the tutorial but got a BLEU score of 11, way less than the 27.3 from the model of same size from “Attention is al you need”. In the paper they trained four times more. So I changed the number of steps to 200k, the number of gpus to 4 (so the model does the 800k steps) and also used a common vocabulary for both english and german. With this setup the best BLEU score I got was 12.6 from the 190k steps checkpoint. Are these results normal? My last configuration is this one:

save_data: wmt
## Where the vocab(s) will be written 
src_vocab: wmt/wmtende.vocab
share_vocab: True

# Corpus opts:
data:
    commoncrawl:
        path_src: wmt/commoncrawl.de-en.en
        path_tgt: wmt/commoncrawl.de-en.de
        transforms: [sentencepiece, filtertoolong]
        weight: 23
    europarl:
        path_src: wmt/europarl-v7.de-en.en
        path_tgt: wmt/europarl-v7.de-en.de
        transforms: [sentencepiece, filtertoolong]
        weight: 19
    news_commentary:
        path_src: wmt/news-commentary-v11.de-en.en
        path_tgt: wmt/news-commentary-v11.de-en.de
        transforms: [sentencepiece, filtertoolong]
        weight: 3
    valid:
        path_src: wmt/valid.en
        path_tgt: wmt/valid.de
        transforms: [sentencepiece]

### Transform related opts:
#### Subword
src_subword_model: wmt/wmtende.model
src_subword_nbest: 1
src_subword_alpha: 0.0

#### Filter
src_seq_length: 150

# silently ignore empty lines in the data
skip_empty_level: silent

#General opts
save_model:  wmt/run/model
save_checkpoint_steps: 10000
keep_checkpoint: 10
seed: 3435
train_steps: 200000
valid_steps: 10000
warmup_steps: 8000
report_every: 100

# Batching
queue_size: 10000
bucket_size: 32768
world_size: 4
gpu_ranks: [0, 1, 2, 3]
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 16
batch_size_multiple: 1
max_generator_batches: 2
accum_count: [3]
accum_steps: [0]

# Optimization
model_dtype: "fp32"
optim: "adam"
learning_rate: 2
warmup_steps: 8000
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"

# Model
encoder_type: transformer
decoder_type: transformer
enc_layers: 6
dec_layers: 6
heads: 8
rnn_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]
share_decoder_embeddings: true
share_embeddings: true

Thank you,
David

Hi David!

I suspect the issue could have something to do with sub-wording, as I saw this before. To eliminate this possibility, remove sentencepiece from the transforms, and provide already sub-worded datasets.

If the issue you have is related to sub-wording, this can be due to either cases:

  • The text is not sub-worded at all, at which case there might be an issue with the configuration; or
  • The text is sub-worded twice; only one way of sub-wording should be used, manual or through transforms.

You can also refer to these tutorials here if you like:

Kind regards,
Yasmin

1 Like

It looks like the configuration from the documentation is missing an important model option:

position_encoding: true

Thank you for your fast responses.
I’ve included the position_encoding option and the new model I’ve just trained has the BLEU score I expected.
I’ll also take a look to the tutorials you are pointing out for further models.