Add support for distributed training on multiple CPU only nodes

Currently in order to reduce the training time, we have to use GPUs. From the documentation it is clear that current implementation allows us to train model with multiple GPUs. However, I have not found out any means to run training on multiple nodes, say 5 nodes, having only CPUs so the total training time can be reduced by a factor 5.

It is good that OpenNMT is opensource but most of the NMT implementations (including OpenNMT) lacks in optimizing training time while training on CPUs. I am not sure if OpenNMT has provisions to train on CPU nodes in distributed manner. If it is then please make that part separate in documentation. And if there is no such provision then in my opinion, this feature has the potential to be a milestone for OpenNMT.

Hi @abhishek - we are not today supporting distributed training on multiple nodes. It is important, and on my task list but rather far. The idea will be to use MPI - through torch MPI (https://github.com/sixin-zh/mpiT) - I will be glad to work with anyone interested in developing that.