OpenNMT

Low BLEU score for translation tutorial

Hello everyone.

I’m following the translation tutorial available in the docs. First i’ve tryed using the same configuration as in the tutorial but got a BLEU score of 11, way less than the 27.3 from the model of same size from “Attention is al you need”. In the paper they trained four times more. So I changed the number of steps to 200k, the number of gpus to 4 (so the model does the 800k steps) and also used a common vocabulary for both english and german. With this setup the best BLEU score I got was 12.6 from the 190k steps checkpoint. Are these results normal? My last configuration is this one:

save_data: wmt
## Where the vocab(s) will be written 
src_vocab: wmt/wmtende.vocab
share_vocab: True

# Corpus opts:
data:
    commoncrawl:
        path_src: wmt/commoncrawl.de-en.en
        path_tgt: wmt/commoncrawl.de-en.de
        transforms: [sentencepiece, filtertoolong]
        weight: 23
    europarl:
        path_src: wmt/europarl-v7.de-en.en
        path_tgt: wmt/europarl-v7.de-en.de
        transforms: [sentencepiece, filtertoolong]
        weight: 19
    news_commentary:
        path_src: wmt/news-commentary-v11.de-en.en
        path_tgt: wmt/news-commentary-v11.de-en.de
        transforms: [sentencepiece, filtertoolong]
        weight: 3
    valid:
        path_src: wmt/valid.en
        path_tgt: wmt/valid.de
        transforms: [sentencepiece]

### Transform related opts:
#### Subword
src_subword_model: wmt/wmtende.model
src_subword_nbest: 1
src_subword_alpha: 0.0

#### Filter
src_seq_length: 150

# silently ignore empty lines in the data
skip_empty_level: silent

#General opts
save_model:  wmt/run/model
save_checkpoint_steps: 10000
keep_checkpoint: 10
seed: 3435
train_steps: 200000
valid_steps: 10000
warmup_steps: 8000
report_every: 100

# Batching
queue_size: 10000
bucket_size: 32768
world_size: 4
gpu_ranks: [0, 1, 2, 3]
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 16
batch_size_multiple: 1
max_generator_batches: 2
accum_count: [3]
accum_steps: [0]

# Optimization
model_dtype: "fp32"
optim: "adam"
learning_rate: 2
warmup_steps: 8000
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"

# Model
encoder_type: transformer
decoder_type: transformer
enc_layers: 6
dec_layers: 6
heads: 8
rnn_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]
share_decoder_embeddings: true
share_embeddings: true

Thank you,
David

Hi David!

I suspect the issue could have something to do with sub-wording, as I saw this before. To eliminate this possibility, remove sentencepiece from the transforms, and provide already sub-worded datasets.

If the issue you have is related to sub-wording, this can be due to either cases:

  • The text is not sub-worded at all, at which case there might be an issue with the configuration; or
  • The text is sub-worded twice; only one way of sub-wording should be used, manual or through transforms.

You can also refer to these tutorials here if you like:

Kind regards,
Yasmin

1 Like

It looks like the configuration from the documentation is missing an important model option:

position_encoding: true

Thank you for your fast responses.
I’ve included the position_encoding option and the new model I’ve just trained has the BLEU score I expected.
I’ll also take a look to the tutorials you are pointing out for further models.