Hello @SamuelLacombe
I have not analyzed those independently, this definitely needs fine tuning and individual testing to figure out what works best.
Yes, the testing data is not part of the training data, but it’s in-domain.
True.
I get the BLEU scores of validation/test data of each model (5 models in a single NMT model), BLEU score seems to be a relative metric, but can be a good indicator wither the models are heading to the right direction.
BLEU-0: Tokenized validation data
BLEU-1: Detokenized validation data
BLEU-2: Tokenized test data
BLEU-3: Detokenized test data
28500 v6/src-vocab.txt
BLEU-0 = 34.76, 61.1/41.1/31.6/25.8 (BP=0.919, ratio=0.922, hyp_len=57572, ref_len=62445)
BLEU-1 = 27.46, 53.0/33.0/23.3/17.7 (BP=0.942, ratio=0.944, hyp_len=37347, ref_len=39568)
BLEU-2 = 36.32, 62.8/43.3/33.4/27.6 (BP=0.913, ratio=0.916, hyp_len=30255, ref_len=33015)
BLEU-3 = 29.16, 55.1/35.0/25.2/19.6 (BP=0.934, ratio=0.936, hyp_len=19841, ref_len=21200)
25000 v7/src-vocab.txt
BLEU-0 = 35.27, 61.2/41.6/32.2/26.3 (BP=0.920, ratio=0.923, hyp_len=59325, ref_len=64243)
BLEU-1 = 27.45, 52.9/32.8/23.3/17.8 (BP=0.943, ratio=0.945, hyp_len=37372, ref_len=39568)
BLEU-2 = 36.37, 62.4/43.2/33.5/27.6 (BP=0.916, ratio=0.919, hyp_len=31232, ref_len=33973)
BLEU-3 = 28.82, 54.6/34.7/24.9/19.3 (BP=0.933, ratio=0.935, hyp_len=19822, ref_len=21200)
22000 v8/src-vocab.txt
BLEU-0 = 36.14, 61.6/42.7/33.0/26.9 (BP=0.925, ratio=0.927, hyp_len=62436, ref_len=67324)
BLEU-1 = 27.20, 52.9/32.8/23.1/17.5 (BP=0.940, ratio=0.942, hyp_len=37255, ref_len=39568)
BLEU-2 = 36.61, 62.4/43.5/33.6/27.4 (BP=0.921, ratio=0.924, hyp_len=32956, ref_len=35675)
BLEU-3 = 27.98, 54.0/33.8/23.9/18.2 (BP=0.937, ratio=0.939, hyp_len=19909, ref_len=21200)
19000 v9/src-vocab.txt
BLEU-0 = 37.18, 61.7/43.6/33.8/27.5 (BP=0.935, ratio=0.937, hyp_len=68244, ref_len=72811)
BLEU-1 = 26.67, 52.3/32.0/22.4/16.8 (BP=0.946, ratio=0.948, hyp_len=37501, ref_len=39568)
BLEU-2 = 37.92, 62.6/44.7/34.7/28.3 (BP=0.931, ratio=0.934, hyp_len=36004, ref_len=38564)
BLEU-3 = 27.57, 53.6/33.3/23.5/17.7 (BP=0.938, ratio=0.940, hyp_len=19934, ref_len=21200)
16000 v10/src-vocab.txt
BLEU-0 = 38.48, 61.8/44.8/35.0/28.3 (BP=0.946, ratio=0.947, hyp_len=79756, ref_len=84217)
BLEU-1 = 24.92, 50.8/30.3/20.6/15.1 (BP=0.947, ratio=0.949, hyp_len=37537, ref_len=39568)
BLEU-2 = 39.34, 63.1/46.2/36.2/29.3 (BP=0.939, ratio=0.941, hyp_len=42260, ref_len=44924)
BLEU-3 = 25.53, 52.1/31.4/21.3/15.6 (BP=0.939, ratio=0.941, hyp_len=19948, ref_len=21200)