Efficient Pipeline Planning for Expedited Distributed DNN Training
Efficient Pipeline Planning for Expedited Distributed DNN Training
To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple versions of model parameters to co-exist (similar to asynchronous training), and cannot ensure the same model convergence and …