ACCL+: an FPGA-Based Collective Engine for Distributed Applications

Zhenhao He, Dario Korolija, Yu Zhu, Benjamin Ramhorst, Tristan Laan, Lucian Petrică, Michaela Blott, Gustavo Alonso

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2312.11742

View Publication

Locations

arXiv (Cornell University) - View
DataCite API - View

Similar Works

Action	Title	Year	Authors
+ PDF Chat	MCR-DL: Mix-and-Match Communication Runtime for Deep Learning	2023	Quentin Anthony Ammar Ahmad Awan Jeff Rasley Yuxiong He Aamir Shafi Mustafa Abduljabbar Hari Subramoni Dhabaleswar K. Panda
+ PDF Chat	Software-hardware co-design for fast and scalable training of deep learning recommendation models	2022	Dheevatsa Mudigere Yuchen Hao Jianyu Huang Zhihao Jia Andrew Tulloch Srinivas Sridharan Xing Liu Mustafa Özdal Jade Nie Jongsoo Park
+	MCR-DL: Mix-and-Match Communication Runtime for Deep Learning	2023	Quentin Anthony Ammar Ahmad Awan Jeff Rasley Yuxiong He Aamir Shafi Mustafa Abduljabbar Hari Subramoni Dhabaleswar K. Panda
+	GPU-initiated Fine-grained Overlap of Collective Communication with Computation	2023	Kishore Punniyamurthy Bradford M. Beckmann Khaled Hamidouche
+	FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure	2022	Yashael Faith Arthanto David Ojika Joo-Young Kim
+	GC3: An Optimizing Compiler for GPU Collective Communication	2022	Meghan Cowan Saeed Maleki Madanlal Musuvathi Olli Saarikivi Yifan Xiong
+ PDF Chat	FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure	2022	Yashael Faith Arthanto David Ojika Joo-Young Kim
+	Accelerating Recommender Systems via Hardware "scale-in".	2020	S. Murali Krishna Ravi Krishna
+	Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models	2021	Dheevatsa Mudigere Yuchen Hao Jianyu Huang Zhihao Jia Andrew Tulloch Srinivas Sridharan Xing Liu Mustafa Özdal Jade Nie Jongsoo Park
+ PDF Chat	TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs	2024	Neha Prakriya Yuze Chi Suhail Basalama Linghao Song Jason Cong
+ PDF Chat	Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators	2023	Hans Johnson Tianyang Fang Alejandro Perez-Vicente Jafar Saniie
+	Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators	2023	Hans Johnson Tianyang Fang Alejandro Perez-Vicente Jafar Saniie
+	CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis	2016	Maohua Zhu Liu Liu Chao Wang Yuan Xie
+	Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud	2018	Christian Pinto Yiannis Gkoufas Andrea Reale Seetharami Seelam Steven Eliuk
+	TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs	2023	Neha Bhairavi Prakriya Yuze Chi Suhail Basalama Linghao Song Jason Cong
+	OCCL: a Deadlock-free Library for GPU Collective Communication	2023	Lichen Pan Juncheng Liu Jinhui Shi Rongkai Zhang Pengze Li Zhen Xiao
+	Accelerating Recommender Systems via Hardware "scale-in"	2020	Suresh Krishna Krishna Ravi
+	Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-Switched Inter-FPGA Networks	2022	Marius Meyer Tobias Kenter Christian Plessl
+	Data Processing with FPGAs on Modern Architectures	2023	Wenqi Jiang Dario Korolija Gustavo Alonso
+ PDF Chat	Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-switched Inter-FPGA Networks	2023	Marius Meyer Tobias Kenter Christian Plessl

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors