TiMePReSt: Time and Memory Efficient Pipeline Parallel DNN Training with Removed Staleness

Type: Preprint

Publication Date: 2024-10-18

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2410.14312

Abstract

DNN training is time-consuming and requires efficient multi-accelerator parallelization, where a single training iteration is split over available accelerators. Current approaches often parallelize training using intra-batch parallelization. Combining inter-batch and intra-batch pipeline parallelism is common to further improve training throughput. In this article, we develop a system, called TiMePReSt, that combines them in a novel way which helps to better overlap computation and communication, and limits the amount of communication. The traditional pipeline-parallel training of DNNs maintains similar working principle as sequential or conventional training of DNNs by maintaining consistent weight versions in forward and backward passes of a mini-batch. Thus, it suffers from high GPU memory footprint during training. In this paper, experimental study demonstrates that compromising weight consistency doesn't decrease prediction capability of a parallelly trained DNN. Moreover, TiMePReSt overcomes GPU memory overhead and achieves zero weight staleness. State-of-the-art techniques often become costly in terms of training time. In order to address this issue, TiMePReSt introduces a variant of intra-batch parallelism that parallelizes the forward pass of each mini-batch by decomposing it into smaller micro-batches. A novel synchronization method between forward and backward passes reduces training time in TiMePReSt. The occurrence of multiple sequence problem and its relation with version difference have been observed in TiMePReSt. This paper presents a mathematical relationship between the number of micro-batches and worker machines, highlighting the variation in version difference. A mathematical expression has been developed to calculate version differences for various combinations of these two without creating diagrams for all combinations.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform 2018 Chi‐Chung Chen
Chia-Lin Yang
Hsiang-Yun Cheng
+ XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training 2019 Lei Guan
Wotao Yin
Dongsheng Li
Xicheng Lu
+ BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training 2020 Letian Zhao
Rui Xu
Tianqi Wang
Teng Tian
Xiaotian Wang
Wei Wu
Chio-in Ieong
Xi Jin
+ PDF Chat BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training 2024 Houming Wu
Ling Chen
Wenjie Yu
+ Survey on Large Scale Neural Network Training 2022 Julia Gusak
Daria Cherniuk
Alena Shilova
Alexander Katrutsa
Daniel Bershatsky
Xunyi Zhao
Lionel Eyraud‐Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
+ HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism 2020 Jay Park
Gyeongchan Yun
Chang M. Yi
Nguyen T. Nguyen
Seungmin Lee
Jaesik Choi
Sam H. Noh
Young-ri Choi
+ HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism 2020 Jay Park
Gyeongchan Yun
Chang M. Yi
Nguyen T. Nguyen
Seungmin Lee
Jaesik Choi
Sam H. Noh
Young-ri Choi
+ PipeDream: Fast and Efficient Pipeline Parallel DNN Training 2018 Aaron Harlap
Deepak Narayanan
Amar Phanishayee
Vivek Seshadri
Nikhil R. Devanur
Greg Ganger
Phil Gibbons
+ DAPPLE: A Pipelined Data Parallel Approach for Training Large Models 2020 Shiqing Fan
Yi Rong
Meng Chen
Zongyan Cao
Siyu Wang
Zhen Zheng
Chuan Wu
Guoping Long
Jun Yang
Lixue Xia
+ PipeDream: Fast and Efficient Pipeline Parallel DNN Training. 2018 Aaron Harlap
Deepak Narayanan
Amar Phanishayee
Vivek Seshadri
Nikhil R. Devanur
Gregory R. Ganger
Phillip B. Gibbons
+ PDF Chat Efficient Pipeline Planning for Expedited Distributed DNN Training 2022 Ziyue Luo
Xiaodong Yi
Guoping Long
Shiqing Fan
Chuan Wu
Jun Yang
Wei Lin
+ PDF Chat Parareal Neural Networks Emulating a Parallel-in-Time Algorithm 2022 Youngkyu Lee
Jong-Ho Park
Chang-Ock Lee
+ Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models 2022 Zhiquan Lai
Shengwei Li
Xudong Tang
Keshi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
+ Parareal Neural Networks Emulating a Parallel-in-time Algorithm 2021 Chang-Ock Lee
Youngkyu Lee
Jong-Ho Park
+ Parareal Neural Networks Emulating a Parallel-in-time Algorithm 2020 Chang-Ock Lee
Youngkyu Lee
Jong-Ho Park
+ HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring 2020 Yuhao Zhou
Qing Ye
Hailun Zhang
Jiancheng Lv
+ Whale: A Unified Distributed Training Framework. 2020 Ang Wang
Xianyan Jia
Le Jiang
Jie Zhang
Yong Li
Wei Lin
+ Out-of-core Training for Extremely Large-Scale Neural Networks With Adaptive Window-Based Scheduling 2020 Akio Hayakawa
Takuya Narihira
+ DistSim: A performance model of large-scale hybrid distributed DNN training 2023 Guandong Lu
Runzhe Chen
Yakai Wang
Yangjie Zhou
Rui Zhang
Zheng Hu
Yanming Miao
Zhifang Cai
Li Li
Jingwen Leng
+ PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management 2021 Jiarui Fang
Yu Yang
Zilin Zhu
Shenggui Li
Yang You
Jie Zhou

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors