I use openNMT-py with colab .
I trained model with a BPE and transformer on dataset fr2en contains 200k sentences for train, 1000 for dev and 1500 for test.
(vocab size 10000, layers 12, batch size 4096, train step 60000).
score BLEU is:
BLEU = 15.20, 41.8/20.1/10.7/5.9 (BP=1.000, ratio=1.018, hyp_len=27277, ref_len=26786).
i tried to train many times but got the same result.
Good to see that you are making progress! Now, 200k is very limited data for a generic model. For this, you need more data, at least 2 million, and make the vocab size 50k.
The other option is to make an in-domain model; for example, use all the software localization datasets you can find, at least 500k, and your dev and test datasets should be of the same domain.
thank you for your answer yasmin.
so transformer need more data i understand that.
there is some another question:
1-some research talk BLEU between 0 and 1,for me when i use (multi-bleu.perl) i obtain BLEU between 0 and 100.what’s different.Is there anything related to it.
2-200k sentences for train, 1000 for dev and 1500 for test. it is normal.
for exemple here can i add 10k for dev and 15k for test?
It is like when we say 0.49 or 49%. They are the same.
The practice is to have the same number of sentences for dev and test to be able to compare the results. Depending on the size of data, something between 1000 and 5000 is enough.