+
PDF
Chat
|
FlashInfer: Efficient and Customizable Attention Engine for LLM
Inference Serving
|
2025
|
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Y. Charles Zhang
Stephanie W. Wang
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
|
+
PDF
Chat
|
Palu: Compressing KV-Cache with Low-Rank Projection
|
2024
|
Chi-Chih Chang
Wei-Cheng Lin
Chien-Yu Lin
Chong-Yan Chen
Yu-Fang Hu
Pei-Shuo Wang
Ning-Chi Huang
Luís Ceze
Kai–Chiang Wu
|
+
PDF
Chat
|
vMCU: Coordinated Memory Management and Kernel Optimization for DNN
Inference on MCUs
|
2024
|
Size Zheng
Renze Chen
Meng Li
Zihao Ye
Luís Ceze
Yun Liang
|
+
|
SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
|
2023
|
Zihao Ye
Ruihang Lai
Junru Shao
Tianqi Chen
Luís Ceze
|
+
|
Punica: Multi-Tenant LoRA Serving
|
2023
|
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luís Ceze
Arvind Krishnamurthy
|
+
|
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
|
2023
|
Yilong Zhao
Chien‐Yu Lin
Kan Zhu
Zihao Ye
Lequn Chen
Size Zheng
Luís Ceze
Arvind Krishnamurthy
Tianqi Chen
Baris Kasikci
|
+
|
Passively sensing SARS-CoV-2 RNA in public transit buses
|
2022
|
Jason S. Hoffman
Matthew Hirano
Nuttada Panpradist
Joseph Breda
Parker S. Ruth
Yuanyi Xu
Jonathan Lester
Bichlien H. Nguyen
Luís Ceze
Shwetak Patel
|
+
|
SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
|
2022
|
Zihao Ye
Ruihang Lai
Junru Shao
Tianqi Chen
Luís Ceze
|
+
PDF
Chat
|
Characterizing and Taming Resolution in Convolutional Neural Networks
|
2021
|
Eddie Yan
Liang Luo
Luís Ceze
|
+
PDF
Chat
|
Pure tensor program rewriting via access patterns (representation pearl)
|
2021
|
Gus Henry Smith
Andrew Liu
Steven Lyubomirsky
Scott Davidson
Joseph McMahan
Michael Taylor
Luís Ceze
Zachary Tatlock
|
+
|
Passively Sensing SARS-CoV-2 RNA in Public Transit Buses
|
2021
|
Jason S. Hoffman
Matthew Hirano
Nuttada Panpradist
Joseph Breda
Parker S. Ruth
Yuanyi Xu
Jonathan Lester
Bichlien H. Nguyen
Luís Ceze
Shwetak Patel
|
+
|
Automated Backend-Aware Post-Training Quantization
|
2021
|
Ziheng Jiang
Animesh Jain
Andy Liu
Josh Fromm
Ma Chengqian
Tianqi Chen
Luís Ceze
|
+
|
VSS: A Storage System for Video Analytics [Technical Report]
|
2021
|
Brandon Haynes
Maureen Daum
Dong He
Amrita Mazumdar
Magdalena Balazinska
Alvin Cheung
Luís Ceze
|
+
|
Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks
|
2021
|
Chien‐Yu Lin
Liang Luo
Luís Ceze
|
+
|
Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
|
2021
|
Liang Luo
Jacob Nelson
Arvind Krishnamurthy
Luís Ceze
|
+
|
Characterizing and Taming Resolution in Convolutional Neural Networks
|
2021
|
Eddie Yan
Liang Luo
Luís Ceze
|
+
|
Srift: Swift and Thrift Cloud-Based Distributed Training.
|
2020
|
Liang Luo
P. West
Arvind Krishnamurthy
Luís Ceze
|
+
|
Enumerating Hardware-Software Splits with Program Rewriting
|
2020
|
G.P. Smith
Zachary Tatlock
Luís Ceze
|
+
|
Srifty: Swift and Thrifty Distributed Training on the Cloud
|
2020
|
Liang Luo
Peter West
Arvind Krishnamurthy
Luís Ceze
|
+
PDF
Chat
|
A Hardware–Software Blueprint for Flexible Deep Learning Specialization
|
2019
|
Thierry Moreau
Tianqi Chen
Luis Vega
Jared Roesch
Eddie Yan
Lianmin Zheng
Josh Fromm
Ziheng Jiang
Luís Ceze
Carlos Guestrin
|
+
|
Synthesizing Number Generators for Stochastic Computing using Mixed Integer Programming
|
2019
|
Vincent T. Lee
Archibald Samuel Elliott
Armin Alaghi
Luís Ceze
|
+
|
Vignette: Perceptual Compression for Video Storage and Processing Systems
|
2019
|
Amrita Mazumdar
Brandon Haynes
Magdalena Balazinska
Luís Ceze
Alvin Cheung
Mark Oskin
|
+
|
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
|
2018
|
Liang Luo
Jacob Nelson
Luís Ceze
Amar Phanishayee
Arvind Krishnamurthy
|
+
|
Parameter Hub
|
2018
|
Liang Luo
Jacob Nelson
Luís Ceze
Amar Phanishayee
Arvind Krishnamurthy
|
+
|
VTA: An Open Hardware-Software Stack for Deep Learning.
|
2018
|
Thierry Moreau
Tianqi Chen
Ziheng Jiang
Luís Ceze
Carlos Guestrin
Arvind Krishnamurthy
|
+
|
A Hardware-Software Blueprint for Flexible Deep Learning Specialization
|
2018
|
Thierry Moreau
Tianqi Chen
Luis Vega
Jared Roesch
Eddie Yan
Lianmin Zheng
Josh Fromm
Ziheng Jiang
Luís Ceze
Carlos Guestrin
|
+
PDF
Chat
|
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
|
2018
|
Liang Luo
Jacob Nelson
Luís Ceze
Amar Phanishayee
Arvind Krishnamurthy
|
+
|
Learning to Optimize Tensor Programs
|
2018
|
Tianqi Chen
Lianmin Zheng
Eddie Yan
Ziheng Jiang
Thierry Moreau
Luís Ceze
Carlos Guestrin
Arvind Krishnamurthy
|
+
|
Correlation Manipulating Circuits for Stochastic Computing.
|
2018
|
Vincent T. Lee
Armin Alaghi
Luís Ceze
|
+
PDF
Chat
|
MATIC: Learning around errors for efficient low-voltage neural network accelerators
|
2018
|
Sung Kim
Patrick Howe
Thierry Moreau
Armin Alaghi
Luís Ceze
Visvesh Sathe
|
+
PDF
Chat
|
Correlation manipulating circuits for stochastic computing
|
2018
|
Vincent T. Lee
Armin Alaghi
Luís Ceze
|
+
|
TVM: End-to-End Optimization Stack for Deep Learning
|
2018
|
Tianqi Chen
Thierry Moreau
Ziheng Jiang
Haichen Shen
Eddie Yan
Leyuan Wang
Yuwei Hu
Luís Ceze
Carlos Guestrin
Arvind Krishnamurthy
|
+
|
Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training.
|
2018
|
Liang Luo
Jacob Nelson
Luís Ceze
Amar Phanishayee
Arvind Krishnamurthy
|
+
|
Learning to Optimize Tensor Programs
|
2018
|
Tianqi Chen
Lianmin Zheng
Eddie Yan
Ziheng Jiang
Thierry Moreau
Luís Ceze
Carlos Guestrin
Arvind Krishnamurthy
|
+
|
Computer Security Risks of Distant Relative Matching in Consumer Genetic Databases
|
2018
|
Peter Ney
Luís Ceze
Tadayoshi Kohno
|
+
|
Stochastic Synthesis for Stochastic Computing
|
2018
|
Vincent T. Lee
Armin Alaghi
Luís Ceze
Mark Oskin
|
+
|
Automating Generation of Low Precision Deep Learning Operators
|
2018
|
Meghan Cowan
Thierry Moreau
Tianqi Chen
Luís Ceze
|
+
|
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
|
2018
|
Tianqi Chen
Thierry Moreau
Ziheng Jiang
Lianmin Zheng
Eddie Yan
Meghan Cowan
Haichen Shen
Leyuan Wang
Yuwei Hu
Luís Ceze
|
+
|
A Hardware-Software Blueprint for Flexible Deep Learning Specialization
|
2018
|
Thierry Moreau
Tianqi Chen
Luis Vega
Jared Roesch
Eddie Yan
Lianmin Zheng
Josh Fromm
Ziheng Jiang
Luís Ceze
Carlos Guestrin
|
+
|
Correlation Manipulating Circuits for Stochastic Computing
|
2018
|
Vincent T. Lee
Armin Alaghi
Luís Ceze
|
+
|
Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training
|
2018
|
Liang Luo
Jacob Nelson
Luís Ceze
Amar Phanishayee
Arvind Krishnamurthy
|
+
PDF
Chat
|
Exploring computation-communication tradeoffs in camera systems
|
2017
|
Amrita Mazumdar
Thierry Moreau
Sung Kim
Meghan Cowan
Armin Alaghi
Luís Ceze
Mark Oskin
Visvesh Sathe
|
+
|
MATIC: Adaptation and In-situ Canaries for Energy-Efficient Neural Network Acceleration
|
2017
|
Sung Kim
Patrick Howe
Thierry Moreau
Armin Alaghi
Luís Ceze
Visvesh Sathe
|
+
|
Exploring Computation-Communication Tradeoffs in Camera Systems
|
2017
|
Amrita Mazumdar
Thierry Moreau
Sung Hoon Kim
Meghan Cowan
Armin Alaghi
Luís Ceze
Mark Oskin
Visvesh Sathe
|
+
|
Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing
|
2017
|
Vincent T. Lee
Armin Alaghi
John P. Hayes
Visvesh Sathe
Luís Ceze
|
+
PDF
Chat
|
Similarity Search on Automata Processors
|
2017
|
Vincent T. Lee
Justin Kotalik
Carlo C. del Mundo
Armin Alaghi
Luís Ceze
Mark Oskin
|
+
|
Making data center computations fast, but not so furious.
|
2017
|
Daniel Porto
João Loff
Rui Policarpo Duarte
Luís Ceze
Rodrigo Rodrigues
|
+
PDF
Chat
|
Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing
|
2017
|
Vincent T. Lee
Armin Alaghi
John P. Hayes
Visvesh Sathe
Luís Ceze
|
+
|
Democratizing Design for Future Computing Platforms
|
2017
|
Luís Ceze
Mark D. Hill
Karthikeyan Sankaralingam
Thomas F. Wenisch
|
+
|
MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators
|
2017
|
Sung Kim
Patrick Howe
Thierry Moreau
Armin Alaghi
Luís Ceze
Visvesh Sathe
|
+
|
Making data center computations fast, but not so furious
|
2017
|
Daniel A. Porto
João Loff
Rui Policarpo Duarte
Luís Ceze
Rodrigo Rodrigues
|
+
|
Exploring Computation-Communication Tradeoffs in Camera Systems
|
2017
|
Amrita Mazumdar
Thierry Moreau
Sung Hoon Kim
Meghan Cowan
Armin Alaghi
Luís Ceze
Mark Oskin
Visvesh Sathe
|
+
|
Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing
|
2017
|
Vincent T. Lee
Armin Alaghi
John P. Hayes
Visvesh Sathe
Luís Ceze
|
+
|
Near Memory Similarity Search on Automata Processors
|
2016
|
Vincent T. Lee
Justin Kotalik
Carlo C. del Mundo
Armin Alaghi
Luís Ceze
Mark Oskin
|
+
|
Similarity Search on Automata Processors
|
2016
|
Vincent T. Lee
Justin Kotalik
Carlo C. del Mundo
Armin Alaghi
Luís Ceze
Mark Oskin
|
+
|
NCAM: Near-Data Processing for Nearest Neighbor Search
|
2016
|
Vincent T. Lee
Carlo C. del Mundo
Armin Alaghi
Luís Ceze
Mark Oskin
Ali Farhadi
|
+
|
Arch2030: A Vision of Computer Architecture Research over the Next 15 Years
|
2016
|
Luís Ceze
Mark D. Hill
Thomas F. Wenisch
|
+
|
Application-Driven Near-Data Processing for Similarity Search
|
2016
|
Vincent T. Lee
Amrita Mazumdar
Carlo C. del Mundo
Armin Alaghi
Luís Ceze
Mark Oskin
|
+
|
21st Century Computer Architecture
|
2016
|
Mark D. Hill
Sarita V. Adve
Luís Ceze
M.J. Irwin
David Kaeli
Margaret Martonosi
Josep Torrellas
Thomas F. Wenisch
David Wood
Katherine Yelick
|
+
|
Similarity Search on Automata Processors
|
2016
|
Vincent T. Lee
Justin Kotalik
Carlo C. del Mundo
Armin Alaghi
Luís Ceze
Mark Oskin
|
+
|
SAP: an Architecture for Selectively Approximate Wireless Communication
|
2015
|
Benjamin Ransford
Luís Ceze
|
+
|
SAP: an Architecture for Selectively Approximate Wireless Communication
|
2015
|
Benjamin Ransford
Luís Ceze
|
+
PDF
Chat
|
The impact of memory models on software reliability in multiprocessors
|
2011
|
Alexander Jaffe
Thomas Moscibroda
Laura Effinger-Dean
Luís Ceze
Karin Strauß
|
+
|
The Impact of Memory Models on Software Reliability in Multiprocessors
|
2011
|
Alexander Jaffe
Thomas Moscibroda
Laura Effinger-Dean
Luís Ceze
Karin Strauß
|
+
|
Sparse Direct Methods
|
2011
|
Jack Dongarra
Piotr Łuszczek
Felix Wolf
Jesper Larsson Träff
Patrice Quinton
Hermann Hellwagner
Martin Fränzle
Christian Lengauer
Luís Ceze
Kei Hiraki
|
+
|
The Impact of Memory Models on Software Reliability in Multiprocessors
|
2011
|
Alexander Jaffe
Thomas Moscibroda
Laura Effinger-Dean
Luís Ceze
Karin Strauß
|