TiMePReSt: Time and Memory Efficient Pipeline Parallel DNN Training with Removed Staleness

A. Dutta, Nabendu Chaki, Rajat K. De

Type: Preprint

Publication Date: 2024-10-18

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2410.14312

Abstract

DNN training is time-consuming and requires efficient multi-accelerator parallelization, where a single training iteration is split over available accelerators. Current approaches often parallelize training using intra-batch parallelization. Combining inter-batch and intra-batch pipeline parallelism is common to further improve training throughput. In this article, we develop a system, called TiMePReSt, that combines them in a novel way which helps to better overlap computation and communication, and limits the amount of communication. The traditional pipeline-parallel training of DNNs maintains similar working principle as sequential or conventional training of DNNs by maintaining consistent weight versions in forward and backward passes of a mini-batch. Thus, it suffers from high GPU memory footprint during training. In this paper, experimental study demonstrates that compromising weight consistency doesn't decrease prediction capability of a parallelly trained DNN. Moreover, TiMePReSt overcomes GPU memory overhead and achieves zero weight staleness. State-of-the-art techniques often become costly in terms of training time. In order to address this issue, TiMePReSt introduces a variant of intra-batch parallelism that parallelizes the forward pass of each mini-batch by decomposing it into smaller micro-batches. A novel synchronization method between forward and backward passes reduces training time in TiMePReSt. The occurrence of multiple sequence problem and its relation with version difference have been observed in TiMePReSt. This paper presents a mathematical relationship between the number of micro-batches and worker machines, highlighting the variation in version difference. A mathematical expression has been developed to calculate version differences for various combinations of these two without creating diagrams for all combinations.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform	2018	Chi‐Chung Chen Chia-Lin Yang Hsiang-Yun Cheng
+	XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training	2019	Lei Guan Wotao Yin Dongsheng Li Xicheng Lu
+	BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training	2020	Letian Zhao Rui Xu Tianqi Wang Teng Tian Xiaotian Wang Wei Wu Chio-in Ieong Xi Jin
+ PDF Chat	BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training	2024	Houming Wu Ling Chen Wenjie Yu
+	Survey on Large Scale Neural Network Training	2022	Julia Gusak Daria Cherniuk Alena Shilova Alexander Katrutsa Daniel Bershatsky Xunyi Zhao Lionel Eyraud‐Dubois Oleg Shlyazhko Denis Dimitrov Ivan Oseledets
+	HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism	2020	Jay Park Gyeongchan Yun Chang M. Yi Nguyen T. Nguyen Seungmin Lee Jaesik Choi Sam H. Noh Young-ri Choi
+	HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism	2020	Jay Park Gyeongchan Yun Chang M. Yi Nguyen T. Nguyen Seungmin Lee Jaesik Choi Sam H. Noh Young-ri Choi
+	PipeDream: Fast and Efficient Pipeline Parallel DNN Training	2018	Aaron Harlap Deepak Narayanan Amar Phanishayee Vivek Seshadri Nikhil R. Devanur Greg Ganger Phil Gibbons
+	DAPPLE: A Pipelined Data Parallel Approach for Training Large Models	2020	Shiqing Fan Yi Rong Meng Chen Zongyan Cao Siyu Wang Zhen Zheng Chuan Wu Guoping Long Jun Yang Lixue Xia
+	PipeDream: Fast and Efficient Pipeline Parallel DNN Training.	2018	Aaron Harlap Deepak Narayanan Amar Phanishayee Vivek Seshadri Nikhil R. Devanur Gregory R. Ganger Phillip B. Gibbons
+ PDF Chat	Efficient Pipeline Planning for Expedited Distributed DNN Training	2022	Ziyue Luo Xiaodong Yi Guoping Long Shiqing Fan Chuan Wu Jun Yang Wei Lin
+ PDF Chat	Parareal Neural Networks Emulating a Parallel-in-Time Algorithm	2022	Youngkyu Lee Jong-Ho Park Chang-Ock Lee
+	Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models	2022	Zhiquan Lai Shengwei Li Xudong Tang Keshi Ge Weijie Liu Yabo Duan Linbo Qiao Dongsheng Li
+	Parareal Neural Networks Emulating a Parallel-in-time Algorithm	2021	Chang-Ock Lee Youngkyu Lee Jong-Ho Park
+	Parareal Neural Networks Emulating a Parallel-in-time Algorithm	2020	Chang-Ock Lee Youngkyu Lee Jong-Ho Park
+	HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring	2020	Yuhao Zhou Qing Ye Hailun Zhang Jiancheng Lv
+	Whale: A Unified Distributed Training Framework.	2020	Ang Wang Xianyan Jia Le Jiang Jie Zhang Yong Li Wei Lin
+	Out-of-core Training for Extremely Large-Scale Neural Networks With Adaptive Window-Based Scheduling	2020	Akio Hayakawa Takuya Narihira
+	DistSim: A performance model of large-scale hybrid distributed DNN training	2023	Guandong Lu Runzhe Chen Yakai Wang Yangjie Zhou Rui Zhang Zheng Hu Yanming Miao Zhifang Cai Li Li Jingwen Leng
+	PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management	2021	Jiarui Fang Yu Yang Zilin Zhu Shenggui Li Yang You Jie Zhou

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors