What are good values for acc and ppl?

Dear community

I am trying to set up a decent DE to IT NMT Engine based on 7M datasets of high quality.
After the training 50000 of steps with a slightly modified “Google”-like transformer modell (currently 4 layers instead of 6 in the opennmt-py faq example) and sentencepiece tokenization I receive the following validation values: acc: 0.75, ppl: 2.98 and lr: 0.00042. I am wondering, if these are good values or what other values to aim for?
My options are to use more layers or do to more steps.

What are your experiences, what should I do, what values should I aim for?

Best regards, Kai

These values look okay I think, but it would depend quite a bit on the task and data I believe.

If you want some more realistic evaluation, you can find some test sets for your language pairs, translate it and score it with BLEU for instance. BLEU is far from being a perfect metric, but it gives a good hint to where your model stands.