Parallel Training

I added data parallel option in training (for GPU only) - 2 modes:

  • synchronous training (default) - batches are processed in parallel, on several replicas sharing the same synchronized parameters, then gradients are aggregated and parameters updated then synchronized again
  • asynchonous training - the replicas are, at different speed, processing batches and update master copy of the parameters after each batch. at a given moment, the replicas do not share exactly the same parameters

see details here: http://opennmt.net//Guide/#parallel-training.

still in testing - try it and share feedback!