Zeke Wang

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator 2024 Jie Zhang
Hongjing Huang
Xuzheng Xu
Xiang Li
Ming Liu
Zeke Wang
+ PDF Chat LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs 2024 Mo Sun
Zihan Yang
Changyue Liao
Y. Li
Fei Wu
Zeke Wang
+ PDF Chat DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference 2024 Jinwei Yao
Kaiqi Chen
Kexun Zhang
Jiaxuan You
Binhang Yuan
Zeke Wang
Tao Lin
+ PDF Chat Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU 2024 Changyue Liao
Mo Sun
Zihan Yang
Kaiqi Chen
Binhang Yuan
Fei Wu
Zeke Wang
+ PDF Chat Demystifying Datapath Accelerator Enhanced Off-path SmartNIC 2024 Xuzheng Chen
Jie Zhang
Ting Fu
Yifan Shen
Ma Shu
Kun Qian
Lingjun Zhu
Chao Shi
Ming Liu
Zeke Wang
+ PDF Chat MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems 2023 Guan Shen
Jieru Zhao
Zeke Wang
Zhe Lin
Wenchao Ding
Chentao Wu
Quan Chen
Minyi Guo
+ PDF Chat P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs 2023 Hongjing Huang
Yingtao Li
Jie Sun
Xueying Zhu
Jie Zhang
Liang Luo
Jialin Li
Zeke Wang
+ P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs 2023 Hongjing Huang
Y. Li
Jie Sun
Xueying Zhu
Jie Zhang
Liang Luo
Jialin Li
Zeke Wang
+ Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training 2023 Jie Sun
Li Su
Zuocheng Shi
Wenting Shen
Zeke Wang
Lei Wang
Jie Zhang
Yong Li
Wenyuan Yu
Jingren Zhou
+ MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems 2023 Guan Shen
Jieru Zhao
Zeke Wang
Zhe Lin
Wenchao Ding
Chentao Wu
Quan Chen
Minyi Guo
+ PyHGL: A Python-based Hardware Generation Language Framework 2023 Jintao Sun
Zeke Wang
Tao Lu
Wenzhi Chen
+ Helios: An Efficient Out-of-core GNN Training System on Terabyte-scale Graphs with In-memory Performance 2023 Jie Sun
Mo Sun
Zheng Zhang
Jun Xie
Zuocheng Shi
Zihan Yang
Jie Zhang
Fei Wu
Zeke Wang
+ ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs 2021 Chenhao Liu
Zhiyuan Shao
Zeke Wang
Kexin Li
Minkang Wu
Jiajie Chen
Xiaofei Liao
Hai Jin
+ PDF Chat Benchmarking High Bandwidth Memory on FPGAs 2020 Zeke Wang
Hongjing Huang
Jie Zhang
Gustavo Alonso
+ Benchmarking High Bandwidth Memory on FPGAs 2020 Zeke Wang
Hongjing Huang
Jie Zhang
Gustavo Alonso
+ Optimizing Memory Performance of Xilinx FPGAs under Vitis 2020 Ruoshi Li
Hongjing Huang
Zeke Wang
Zhiyuan Shao
Xiaofei Liao
Hai Jin
+ PDF Chat Accelerating generalized linear models with MLWeaving 2019 Zeke Wang
Kaan Kara
Hantian Zhang
Gustavo Alonso
Onur Mutlu
Ce Zhang
+ Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report) 2019 Zeke Wang
Kaan Kara
Hantian Zhang
Gustavo Alonso
Onur Mutlu
Ce Zhang
+ A note on vector labelling algorithm 1998 Jianghua Fan
Zeke Wang
+ Notes on computation of Kakutani fixed points 1996 Zeke Wang
+ A Complexity Comparison of Kuhn’s Algorithm and Newton Method 1994 Zeke Wang
Xu Senlin
Tangan Gao
+ Probabilistic Discussion on Zeros of Polynomial Mappings 1994 Zeke Wang
Xu Senlin
Tangan Gao
+ Piecewise Linear Algorithms 1994 Zeke Wang
Xu Senlin
Tangan Gao
+ Newton Method and Approximate Zeros 1994 Zeke Wang
Xu Senlin
Tangan Gao
+ Homotopy Algorithms 1994 Zeke Wang
Sen-Lin Xu
Tangan Gao
+ On the complexity of a PL homotopy algorithm for zeros of polynomials 1993 Tangan Gao
Zeke Wang
+ Triangulate flat cones on simplices 1991 Zeke Wang
+ On the geometry of paths generated by PL homotopy methods 1990 Zeke Wang
+ A geometrical interpretation of the without-exception feasibility of PL homotopy methods 1989 Zeke Wang
+ On zero distribution of a class of continuous functions 1987 Zeke Wang
+ On the cost of computing roots of polynomials 1984 Harold W. Kuhn
Zeke Wang
Senlin Xu
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ A PL homotopy for finding all the roots of a polynomial 1979 Masakazu Kojima
Hisakazu Nishino
Naohiko Arima
4
+ PDF Chat In-RDBMS hardware acceleration of advanced analytics 2018 Divya Mahajan
Joon‐Kyung Kim
Jacob Sacks
Adel Ardalan
Arun Kumar
Hadi Esmaeilzadeh
3
+ Simplicial and Continuation Methods for Approximating Fixed Points and Solutions to Systems of Equations 1980 Eugene L. Allgower
Kurt Georg
3
+ The Solution of Systems of Piecewise Linear Equations 1976 B. Curtis Eaves
Herbert E. Scarf
3
+ PDF Chat Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks 2018 Charles A. Eckert
Xiaowei Wang
Jingcheng Wang
Arun Subramaniyan
Ravi Iyer
Dennis Sylvester
David Blaaauw
Reetuparna Das
3
+ PDF Chat BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing 2018 Yaman Umuroglu
Lahiru Rasnayake
Magnus SjÀlander
3
+ PDF Chat E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs 2019 Zhe Li
Caiwen Ding
Siyue Wang
Wujie Wen
Youwei Zhuo
Chang Liu
Qinru Qiu
Wenyao Xu
Xue Lin
Xuehai Qian
3
+ Finding Roots of Polynomials By Pivoting 1977 Harold W. Kuhn
3
+ Finding all solutions to polynomial systems and other systems of equations 1979 C. B. Garcia
Willard I. Zangwill
2
+ PDF Chat Bit-pragmatic deep neural network computing 2017 Jorge Albericio
Alberto DelmĂĄs
Patrick Judd
Sayeh Sharify
Gerard O’Leary
Roman Genov
Andreas Moshovos
2
+ PDF Chat Efficient Processing of Deep Neural Networks: A Tutorial and Survey 2017 Vivienne Sze
Yu‐Hsin Chen
Tien-Ju Yang
Joel Emer
2
+ In-Datacenter Performance Analysis of a Tensor Processing Unit 2017 Norman P. Jouppi
Cliff Young
Nishant Patil
David A. Patterson
Gaurav Agrawal
Raminder Bajwa
S. C. Bates
Suresh Bhatia
Nan Boden
Al Borchers
2
+ Computation of all solutions to a system of polynomial equations 1983 Masakazu Kojima
Shinji Mizuno
2
+ MLbench 2018 YĂŒ Liu
Hantian Zhang
Luyuan Zeng
Wentao Wu
Ce Zhang
2
+ PDF Chat The fundamental theorem of algebra and complexity theory 1981 Steve Smale
2
+ Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning 2022 Lianmin Zheng
Zhuohan Li
Hao Zhang
Yonghao Zhuang
Zhifeng Chen
Yanping Huang
Yida Wang
Yuanzhong Xu
Danyang Zhuo
Joseph E. Gonzalez
2
+ PDF Chat Feature hashing for large scale multitask learning 2009 Kilian Q. Weinberger
Anirban Dasgupta
John Langford
Alex Smola
Josh Attenberg
2
+ Fast Sorting Algorithms using AVX-512 on Intel Knights Landing 2017 BĂ©renger Bramas
1
+ Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability 2017 Alberto DelmĂĄs
Sayeh Sharify
Patrick Judd
Andreas Moshovos
1
+ PDF Chat Scaling deep learning on GPU and knights landing clusters 2017 Yang You
Aydın Buluç
James Demmel
1
+ Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks 2017 Urs Köster
Tristan J. Webb
Xin Wang
Marcel Nassar
Arjun K. Bansal
William Constable
Oğuz H. Elibol
Scott Gray
Stewart Hall
Luke Hornof
1
+ Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks 2017 Hardik Sharma
Jongse Park
Naveen Suda
Liangzhen Lai
Benson Chau
Joon‐Kyung Kim
Vikas Chandra
Hadi Esmaeilzadeh
1
+ PDF Chat VIBNN 2018 Ruizhe Cai
Ao Ren
Ning Liu
Caiwen Ding
Luhao Wang
Xuehai Qian
Massoud Pedram
Yanzhi Wang
1
+ Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions 2018 Nicolas Vasilache
Oleksandr Zinenko
Theodoros Theodoridis
Priya Goyal
Zachary DeVito
William S. Moses
Sven Verdoolaege
Andrew Adams
Albert Cohen
1
+ High-Accuracy Low-Precision Training 2018 Christopher De
Megan Leszczynski
Jian Zhang
Alana Marzoev
Christopher R. Aberger
Kunle Olukotun
Christopher RĂ©
1
+ PDF Chat Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA 2018 Junsong Wang
Qiuwen Lou
Xiaofan Zhang
Chao Zhu
Yonghua Lin
Deming Chen
1
+ StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory 2019 Hongyu Miao
Myeongjae Jeon
Gennady Pekhimenko
Kathryn S. McKinley
Felix Xiaozhu Lin
1
+ Scaling Distributed Machine Learning with In-Network Aggregation 2019 Amedeo Sapio
Marco Canini
Chen-Yu Ho
Jacob Nelson
Panos Kalnis
Changhoon Kim
Arvind Krishnamurthy
Masoud Moshref
Dan R. K. Ports
Peter RichtĂĄrik
1
+ PDF Chat EIE: Efficient Inference Engine on Compressed Deep Neural Network 2016 Song Han
Xingyu Liu
Huizi Mao
Jing Pu
Ardavan Pedram
Mark Horowitz
William J. Dally
1
+ HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent 2011 Feng Niu
Benjamin Recht
Christopher RĂ©
Stephen J. Wright
1
+ The MADlib Analytics Library or MAD Skills, the SQL 2012 Joe Hellerstein
Christopher RĂ©
Florian Schoppmann
Daisy Zhe Wang
Eugene Fratkin
Aleksander Gorajek
Kee Siong Ng
Caleb Welton
Xixuan Feng
Kun Li
1
+ DimmWitted: A Study of Main-Memory Statistical Analytics 2014 Ce Zhang
Christopher RĂ©
1
+ PDF Chat In-Datacenter Performance Analysis of a Tensor Processing Unit 2017 Norman P. Jouppi
Cliff Young
Nishant Patil
David A. Patterson
Gaurav Agrawal
Raminder Bajwa
S. C. Bates
Suresh Bhatia
Nan Boden
Al Borchers
1
+ PDF Chat Real-time intelligent big data processing: technology, platform, and applications 2019 Tongya Zheng
Gang Chen
Xinyu Wang
Chun Chen
Xingen Wang
Sihui Luo
1
+ Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks 2017 Urs Köster
Tristan J. Webb
Xin Wang
Marcel Nassar
Arjun K. Bansal
William Constable
Oğuz H. Elibol
Scott Gray
Stewart Hall
Luke Hornof
1
+ Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 2019 Mohammad Shoeybi
Mostofa Patwary
Raul Puri
Patrick LeGresley
Jared Casper
Bryan Catanzaro
1
+ GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism 2018 Yanping Huang
Youlong Cheng
Ankur Bapna
Orhan Fırat
Mia Xu Chen
Dehao Chen
HyoukJoong Lee
Jiquan Ngiam
Quoc V. Le
Yonghui Wu
1
+ PDF Chat CASIA-SURF: A Large-Scale Multi-Modal Benchmark for Face Anti-Spoofing 2020 Shifeng Zhang
Ajian Liu
Jun Wan
Yanyan Liang
Guodong Guo
SĂ©rgio Escalera
Hugo Jair Escalante
Stan Z. Li
1
+ Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD 2018 Jianyu Wang
Gauri Joshi
1
+ Beyond Data and Model Parallelism for Deep Neural Networks 2018 Zhihao Jia
Matei Zaharia
Alex Aiken
1
+ PDF Chat The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface 2019 Hamid Reza Zohouri
Satoshi Matsuoka
1
+ PDF Chat FINN 2017 Yaman Umuroglu
Nicholas J. Fraser
Giulio Gambardella
Michaela Blott
Philip H. W. Leong
Magnus Jahre
Kees Vissers
1
+ PDF Chat Exploring the Performance Benefit of Hybrid Memory System on HPC Environments 2017 Ivy Bo Peng
Roberto Gioiosa
Gökçen Kestor
Pietro Cicotti
Erwin Laure
Stefano Markidis
1
+ TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models 2021 Zhuohan Li
Siyuan Zhuang
Shiyuan Guo
Danyang Zhuo
Hao Zhang
Dawn Song
Ion Stoica
1
+ PDF Chat Heterogeneous Dataflow Accelerators for Multi-DNN Workloads 2021 Hyoukjun Kwon
Liangzhen Lai
Michael Pellauer
Tushar Krishna
Yu‐Hsin Chen
Vikas Chandra
1
+ GSPMD: General and Scalable Parallelization for ML Computation Graphs 2021 Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
Rahul Joshi
Maxim Krikun
Dmitry Lepikhin
Andy Ly
Marcello Maggioni
1
+ Fixed Point Theory and Applications 1992 Yeol Je Cho
Kang Sin-min
Jong Kyu Kim
1
+ Fixed Points. Algorithms and Applications 1977 S. Karamardian
C. B. Garcia
1
+ PDF Chat MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores 2022 Sheng-Chun Kao
Tushar Krishna
1
+ HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism 2020 Jay Park
Gyeongchan Yun
Chang M. Yi
Nguyen T. Nguyen
Seungmin Lee
Jaesik Choi
Sam H. Noh
Young-ri Choi
1