Hello there,
I’m studiying on WMT18 corpuses and test sets.
With about 200K Turkish-English corpus I got BLEU score of about 12 which is normal I think because the corpus is very small.
With about 2 million German-English corpus I got BLEU score of about 23 and again it is normal I think because the corpus is not very big for NMT. With Moses (SMT) tool I got 22.6 BLEU score.
Now I want to get higher (greater than 30) BLEU scores but I couldn’t do that whatever I done.
First I began with selecting a bigger corpus (5 million, 7 million sentences etc. with different languages) but with classical approach I faced with many “unk” character. This is normal because the distint words are too much now. But when I grew up the vocabulary size (from 50K to 100K for example) I got “CUDA out of memory” error.
Then I decided to use BPE technics. I used SentencePiece, subword-NMT, vs. but still I connot get good BLEU scores. Every time the scores were around 20.
I suppose I’m making mistake somewhere. Can you give advices what can I do to get higher BLEU scores?
Thanks in advance.