I tried English to Chinese training several times.
The total corpus size of M6 model is 19855197
v1
- English is not case converted
2.train:19656646 val:198551 - Align files without adding words
step 20w
r:0.494755170243611
p:0.4779859562907683
f:0.48265058031158203
Model size: 986M
v2
- English is not case converted
2.train:19656646 val:198551 - Add word alignment file
step 20w
r:0.4976210266977913
p:0.4767123516687056
f:0.48350180947632854
Model size: 986m
v3
- English to lowercase
2.train:19656646 val:198551 - Add word alignment file
step 20w
r:0.4917068697418362
p:0.4751876029938719
f:0.4798988116173338
Model size: 986m
I use transformer model.
Do I need to increase the corpus size and model size?
Is there any way to further improve the model score. Please help me.