ACCL+: an FPGA-Based Collective Engine for Distributed Applications

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2312.11742

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat MCR-DL: Mix-and-Match Communication Runtime for Deep Learning 2023 Quentin Anthony
Ammar Ahmad Awan
Jeff Rasley
Yuxiong He
Aamir Shafi
Mustafa Abduljabbar
Hari Subramoni
Dhabaleswar K. Panda
+ PDF Chat Software-hardware co-design for fast and scalable training of deep learning recommendation models 2022 Dheevatsa Mudigere
Yuchen Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
Srinivas Sridharan
Xing Liu
Mustafa Özdal
Jade Nie
Jongsoo Park
+ MCR-DL: Mix-and-Match Communication Runtime for Deep Learning 2023 Quentin Anthony
Ammar Ahmad Awan
Jeff Rasley
Yuxiong He
Aamir Shafi
Mustafa Abduljabbar
Hari Subramoni
Dhabaleswar K. Panda
+ GPU-initiated Fine-grained Overlap of Collective Communication with Computation 2023 Kishore Punniyamurthy
Bradford M. Beckmann
Khaled Hamidouche
+ FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure 2022 Yashael Faith Arthanto
David Ojika
Joo-Young Kim
+ GC3: An Optimizing Compiler for GPU Collective Communication 2022 Meghan Cowan
Saeed Maleki
Madanlal Musuvathi
Olli Saarikivi
Yifan Xiong
+ PDF Chat FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure 2022 Yashael Faith Arthanto
David Ojika
Joo-Young Kim
+ Accelerating Recommender Systems via Hardware "scale-in". 2020 S. Murali Krishna
Ravi Krishna
+ Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models 2021 Dheevatsa Mudigere
Yuchen Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
Srinivas Sridharan
Xing Liu
Mustafa Özdal
Jade Nie
Jongsoo Park
+ PDF Chat TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs 2024 Neha Prakriya
Yuze Chi
Suhail Basalama
Linghao Song
Jason Cong
+ PDF Chat Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators 2023 Hans Johnson
Tianyang Fang
Alejandro Perez-Vicente
Jafar Saniie
+ Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators 2023 Hans Johnson
Tianyang Fang
Alejandro Perez-Vicente
Jafar Saniie
+ CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis 2016 Maohua Zhu
Liu Liu
Chao Wang
Yuan Xie
+ Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud 2018 Christian Pinto
Yiannis Gkoufas
Andrea Reale
Seetharami Seelam
Steven Eliuk
+ TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs 2023 Neha Bhairavi Prakriya
Yuze Chi
Suhail Basalama
Linghao Song
Jason Cong
+ OCCL: a Deadlock-free Library for GPU Collective Communication 2023 Lichen Pan
Juncheng Liu
Jinhui Shi
Rongkai Zhang
Pengze Li
Zhen Xiao
+ Accelerating Recommender Systems via Hardware "scale-in" 2020 Suresh Krishna
Krishna Ravi
+ Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-Switched Inter-FPGA Networks 2022 Marius Meyer
Tobias Kenter
Christian Plessl
+ Data Processing with FPGAs on Modern Architectures 2023 Wenqi Jiang
Dario Korolija
Gustavo Alonso
+ PDF Chat Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-switched Inter-FPGA Networks 2023 Marius Meyer
Tobias Kenter
Christian Plessl

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors