Currently in order to reduce the training time, we have to use GPUs. From the documentation it is clear that current implementation allows us to train model with multiple GPUs. However, I have not found out any means to run training on multiple nodes, say 5 nodes, having only CPUs so the total training time can be reduced by a factor 5.
It is good that OpenNMT is opensource but most of the NMT implementations (including OpenNMT) lacks in optimizing training time while training on CPUs. I am not sure if OpenNMT has provisions to train on CPU nodes in distributed manner. If it is then please make that part separate in documentation. And if there is no such provision then in my opinion, this feature has the potential to be a milestone for OpenNMT.