Poor test set Accuracy

Hello,

I have almost 70k parallel corpus. I have done 3 training with this corpus of 5000,6500,10000 steps respectively .

After that i have calculated BLEU score by using this command:
cat pred_6500.txt | perl sacrebleu entest.txt
Note : This is for 6500 steps training model

This gives me output like something:
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.4.14 = 9.3 42.8/14.4/6.8/4.1 (BP = 0.813 ratio = 0.829 hyp_len = 8963 ref_len = 10814)

what is my BLEU score here?? 9.3??
Then what are the other values suct that 42.8/14/4/6.8/4.1 ??

In Addition,
for 5000 steps =>
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.4.14 = 7.3 43.5/13.4/5.6/2.7 (BP = 0.755 ratio = 0.780 hyp_len = 8438 ref_len = 10814)

for 10000 steps =>
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.4.14 = 8.9 44.6/15.4/7.3/4.3 (BP = 0.735 ratio = 0.764 hyp_len = 8266 ref_len = 10814)

If I take 9.3 as BLEU score , then i have to say that it is very poor . So what is the way to make it better ??

I am also providin mu .yml file for your easy understanding:

Where the samples will be written

save_data: run/example

Where the vocab(s) will be written

src_vocab: run/example.vocab.src
tgt_vocab: run/example.vocab.tgt

Prevent overwriting existing files in the folder

overwrite: False
data:
corpus_1:
path_src: bn.txt
path_tgt: en.txt
valid:
path_src: bndev.txt
path_tgt: endev.txt
save_model: run/model6500
save_checkpoint_steps: 10000
keep_checkpoint: 10
seed: 3435
train_steps: 6500
valid_steps: 6500
warmup_steps: 2000
report_every: 100

decoder_type: transformer
encoder_type: transformer
word_vec_size: 128
rnn_size: 128
layers: 3
transformer_ff: 2048
heads: 4

accum_count: 8
optim: adam
adam_beta1: 0.9
adam_beta2: 0.998
decay_method: noam
learning_rate: 2.0
max_grad_norm: 0.0

batch_size: 2048
batch_type: tokens
normalization: tokens
dropout: 0.1
label_smoothing: 0.1

max_generator_batches: 2

param_init: 0.0
param_init_glorot: ‘true’
position_encoding: ‘true’

world_size: 1
gpu_ranks:

  • 0

Thanks is Advance !!

what is my BLEU score here?? 9.3??
Then what are the other values suct that 42.8/14/4/6.8/4.1 ??

Yes, BLEU is 9.3 here. The values are the precisions for each ngram order.
I encourage you to use sacrebleu instead of the perl scripts. You can see some details of the implementation here for instance: https://github.com/mjpost/sacrebleu/blob/master/sacrebleu/metrics/bleu.py

If I take 9.3 as BLEU score , then i have to say that it is very poor . So what is the way to make it better ??

70k segments is a very small dataset to learn the task. You can try to find more data, or build some for instance via iterative backtranslation. Some multilingual setup with similar languages might help you as well.

Thanks a lot for your kind information !! I will ask you further on this topic later.

But now, Apart from NMT , may i know about some information about Example Based Machine Translation(EBMT)?