BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating
Large Models Training
BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating
Large Models Training
With the increasing scale of models, the need for efficient distributed training has become increasingly urgent. Recently, many synchronous pipeline parallelism approaches have been proposed to improve training throughput. However, these approaches still suffer from two major issues, i.e., pipeline bubbles caused by periodic flushing and extra communication due to …