It can finish training Transformer with 12 layers within 24 hours in google colab with dataset contain(2,5M).
I did not try training that much on Google Colab, but in general I do not think so. Plus, the free version of Google Colab allows 12 hours only anyhow. You can still use
train_from to continue what you started although I do not think it is a practical solution.
As for 12 layers on 2,5M segments, unless you are following a specific paper, this seems too many layers for this amount of segments.
It is clear you are new to OpenNMT and this is okay; however, it makes sense to start with a simple project and make sure you master it before you move to a bigger project.
Finally, if you need help, you need to try things first yourself and be specific about the problem you have, explaining: 1) I did so; 2) I expected so; but 3) I get so. Otherwise, it would be difficult for people to “predict” how to help you even if they want to.
All the best,