Converging the training in meaningful time

Hi all,

I’m new to OpenNMT and trying to train a model using public UN corpus and all default parameters. I only have 1 GPU (Tesla K).
The training is running well and seems to utilize my GPU but it still took 2.5 days to train epoch 1.

I wonder if I will be able to train the model in meaningful time. I have the following questions:

  • My perplexity is 4.77 right now and is decreasing - but I’ve no idea whether this is good or bad. Are there any estimates on which is a good one or when training is considered complete?
  • Is there any estimation on how many epochs training typically takes? I’ve seen low numbers such as 5-15 when searching this forum, but I wonder if it is always like that.
  • What are general techniques to improve performance (preferrably without starting from scratch)? Looks like defaut params already contain e.g. dropout, would it make sense to make it more agressive?



You usually want to run a test translation on gold data and compute a translation metric (typically BLEU). As long as the metric you care about is improving, the training is not complete.

It depends on the size of the dataset. But this is ultimately related to the first answer: you want to continue as long as it is learning something from the data.

During retraining, tuning the learning rate or changing the optimization strategy could be a way to improve the model performance. But it is not an exact science and requires a lot of experiment.

During inference, you can still increase the beam size to search across more hypotheses.

Thanks Guillaume for your answer!

Is BLEU computation part of the training? I can see BLEU script in OpenNMT repo, but I don’t see any related entries in logs or training options that would check BLEU and stop the training when necessary.

Thanks, I will try that. That said, 1 epoch takes 2.5 days for me, so even if we say change learning rate, the training is going to be multi-week effort most likely. Should I just reduce training set? Or try techiques like sampling?

You can check the -validation_metric option during training. Otherwise, you can just wait for a new checkpoint to be saved, run a translation, and evaluate with the scorer tool.

You usually don’t want to reduce the training set unless you are doing constrained experiments. I recommend using the data sampling approach to reduce the size of an epoch.

Thanks Guillaume! Makes sense.