You usually want to run a test translation on gold data and compute a translation metric (typically BLEU). As long as the metric you care about is improving, the training is not complete.
It depends on the size of the dataset. But this is ultimately related to the first answer: you want to continue as long as it is learning something from the data.
During retraining, tuning the learning rate or changing the optimization strategy could be a way to improve the model performance. But it is not an exact science and requires a lot of experiment.
During inference, you can still increase the beam size to search across more hypotheses.