OpenNMT Forum

English to Russian translation with OpenNMT-py


(Tatyana Lysenko) #1

Hi,

I’ve made a series of experiments in English to Russian translation but till so far I wasn’t able to achieve higher then 12.2 BLEU score and I wonder what can be done to bring it up to 20-30.

Training and validation: Yandex-1M dataset
Testing: WMT 13-18 datasets.

List of experiments ranging from the best to the worst BLEU score:

| Preprocessing | Model | BLEU score |

1 | no preprocessing | Transformer at 1 GPU | 8.2-12.2 |
2 | no preprocessing | deeper architecture ( enc layers 3, enc rnn size 800, dec layers 2, dec rnn size 800) | 8.1-12.1 |
3 | no preprocessing | standard ONMT-py | 8-12 |
4 | Sentencepiece standard model | standard ONMT-py | 2.5 - 3.5 |


(Vincent Nguyen) #2

You need more data, Yandex is too small.


(Tatyana Lysenko) #3

Thank you, Vincent, what amount of data should be more or less sufficient ? 5M ? 10M?


(Vincent Nguyen) #4

with 5M you will start to have good results.