I am trying to set up a decent DE to IT NMT Engine based on 7M datasets of high quality.
After the training 50000 of steps with a slightly modified “Google”-like transformer modell (currently 4 layers instead of 6 in the opennmt-py faq example) and sentencepiece tokenization I receive the following validation values: acc: 0.75, ppl: 2.98 and lr: 0.00042. I am wondering, if these are good values or what other values to aim for?
My options are to use more layers or do to more steps.
What are your experiences, what should I do, what values should I aim for?
Best regards, Kai