+
PDF
Chat
|
Squeezed Attention: Accelerating Long Context Length LLM Inference
|
2024
|
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Muthucumaru Maheswaran
Joonki Paik
Michael W. Mahoney
Kurt Keutzer
Amir Gholami
|
+
PDF
Chat
|
Efficient and Scalable Estimation of Tool Representations in Vector
Space
|
2024
|
Suhong Moon
Siddharth Jha
Lutfi Eren Erdogan
Sehoon Kim
Woosang Lim
Kurt Keutzer
Amir Gholami
|
+
PDF
Chat
|
TinyAgent: Function Calling at the Edge
|
2024
|
Lutfi Eren Erdogan
Nick Lee
Siddharth Jha
Se Hoon Kim
Ryan Tabrizi
Suhong Moon
Coleman Hooper
Gopala K. Anumanchipalli
Kurt Keutzer
Amir Gholami
|
+
PDF
Chat
|
Characterizing Prompt Compression Methods for Long Context Inference
|
2024
|
Siddharth Jha
Lutfi Eren Erdogan
Sehoon Kim
Kurt Keutzer
Amir Gholami
|
+
PDF
Chat
|
Reliable edge machine learning hardware for scientific applications
|
2024
|
Tommaso Lisini Baldi
Javier Campos
Benjamin Hawks
J. Ngadiuba
Nhan Viet Tran
Daniel DĂaz
J. Duarte
Ryan Kastner
Andres Meza
M. Quinnan
|
+
PDF
Chat
|
Reliable edge machine learning hardware for scientific applications
|
2024
|
Tommaso Lisini Baldi
Javier Campos
Benjamin Hawks
J. Ngadiuba
Nhan Viet Tran
Daniel DĂaz
J. Duarte
Ryan Kastner
Andres Meza
M. Quinnan
|
+
PDF
Chat
|
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
|
2024
|
Nick Lee
Thanakul Wattanawong
Sehoon Kim
Karttikeya Mangalam
Sheng Shen
Gopala Anumanchipali
Michael W. Mahoney
Kurt Keutzer
Amir Gholami
|
+
PDF
Chat
|
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache
Quantization
|
2024
|
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Michael W. Mahoney
Yakun Sophia Shao
Kurt Keutzer
Amir Gholami
|
+
|
Speculative Decoding with Big Little Decoder
|
2023
|
Sehoon Kim
Karttikeya Mangalam
Jitendra Malik
Michael W. Mahoney
Amir Gholami
Kurt Keutzer
|
+
|
Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
|
2023
|
Shashank Subramanian
Peter Harrington
Kurt Keutzer
W. Bhimji
Dmitriy Morozov
Michael W. Mahoney
Amir Gholami
|
+
|
SqueezeLLM: Dense-and-Sparse Quantization
|
2023
|
Sehoon Kim
Coleman Hooper
Amir Gholami
Zhen Dong
Xiuyu Li
Sheng Shen
Michael W. Mahoney
Kurt Keutzer
|
+
|
SPEED: Speculative Pipelined Execution for Efficient Decoding
|
2023
|
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Hasan Genç
Kurt Keutzer
Amir Gholami
Sophia Shao
|
+
|
An LLM Compiler for Parallel Function Calling
|
2023
|
Sehoon Kim
Suhong Moon
Ryan Tabrizi
Nick Lee
Michael W. Mahoney
Kurt Keutzer
Amir Gholami
|
+
PDF
Chat
|
Learned Token Pruning for Transformers
|
2022
|
Sehoon Kim
Sheng Shen
David Thorsley
Amir Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
|
+
PDF
Chat
|
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition
|
2022
|
Sehoon Kim
Amir Gholami
Zhewei Yao
Nick Lee
Patrick Wang
Aniruddha Nrusimha
Bohan Zhai
Tianren Gao
Michael W. Mahoney
Kurt Keutzer
|
+
PDF
Chat
|
A Survey of Quantization Methods for Efficient Neural Network Inference
|
2022
|
Amir Gholami
Sehoon Kim
Zhen Dong
Zhewei Yao
Michael W. Mahoney
Kurt Keutzer
|
+
PDF
Chat
|
Hessian-Aware Pruning and Optimal Neural Implant
|
2022
|
Shixing Yu
Zhewei Yao
Amir Gholami
Zhen Dong
Sehoon Kim
Michael W. Mahoney
Kurt Keutzer
|
+
|
A Fast Post-Training Pruning Framework for Transformers
|
2022
|
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
Amir Gholami
|
+
|
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
|
2022
|
Sehoon Kim
Amir Gholami
Albert C. Shaw
Nicholas Lee
Karttikeya Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
|
+
PDF
Chat
|
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
|
2021
|
Zhewei Yao
Amir Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
|
+
|
Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition.
|
2021
|
Sehoon Kim
Amir Gholami
Zhewei Yao
Aniruddha Nrusimha
Bohan Zhai
Tianren Gao
Michael W. Mahoney
Kurt Keutzer
|
+
|
I-BERT: Integer-only BERT Quantization
|
2021
|
Sehoon Kim
Amir Gholami
Zhewei Yao
Michael W. Mahoney
Kurt Keutzer
|
+
|
Hessian-Aware Pruning and Optimal Neural Implant
|
2021
|
Shixing Yu
Zhewei Yao
Amir Gholami
Zhen Dong
Michael W. Mahoney
Kurt Keutzer
|
+
|
Integer-only Zero-shot Quantization for Efficient Speech Recognition
|
2021
|
Sehoon Kim
Amir Gholami
Zhewei Yao
Nick Lee
Patrick Wang
Aniruddha Nrusimha
Bohan Zhai
Tianren Gao
Michael W. Mahoney
Kurt Keutzer
|
+
|
A Survey of Quantization Methods for Efficient Neural Network Inference
|
2021
|
Amir Gholami
Sehoon Kim
Zhen Dong
Zhewei Yao
Michael W. Mahoney
Kurt Keutzer
|
+
|
Learned Token Pruning for Transformers
|
2021
|
Sehoon Kim
Sheng Shen
David Thorsley
Amir Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
|
+
PDF
Chat
|
PyHessian: Neural Networks Through the Lens of the Hessian
|
2020
|
Zhewei Yao
Amir Gholami
Kurt Keutzer
Michael W. Mahoney
|
+
PDF
Chat
|
HAWQV3: Dyadic Neural Network Quantization
|
2020
|
Zhewei Yao
Zhen Dong
Zhangcheng Zheng
Amir Gholami
Jiali Yu
Eric Tan
Leyuan Wang
Qijing Huang
Yida Wang
Michael W. Mahoney
|
+
|
Boundary thickness and robustness in learning models
|
2020
|
Yaoqing Yang
Rajiv Khanna
Yaodong Yu
Amir Gholami
Kurt Keutzer
Joseph E. Gonzalez
Kannan Ramchandran
Michael W. Mahoney
|
+
PDF
Chat
|
ZeroQ: A Novel Zero Shot Quantization Framework
|
2020
|
Yaohui Cai
Zhewei Yao
Zhen Dong
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
PDF
Chat
|
Inefficiency of K-FAC for Large Batch Size Training
|
2020
|
Linjian Ma
Gabe Montague
Jiayu Ye
Zhewei Yao
Amir Gholami
Kurt Keutzer
Michael W. Mahoney
|
+
PDF
Chat
|
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
|
2020
|
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Zhewei Yao
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
Rethinking Batch Normalization in Transformers
|
2020
|
Sheng Shen
Zhewei Yao
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
PowerNorm: Rethinking Batch Normalization in Transformers
|
2020
|
Sheng Shen
Zhewei Yao
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
ZeroQ: A Novel Zero Shot Quantization Framework
|
2020
|
Yaohui Cai
Zhewei Yao
Zhen Dong
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
Boundary thickness and robustness in learning models
|
2020
|
Yaoqing Yang
Rajiv Khanna
Yaodong Yu
Amir Gholami
Kurt Keutzer
Joseph E. Gonzalez
Kannan Ramchandran
Michael W. Mahoney
|
+
|
PowerNorm: Rethinking Batch Normalization in Transformers
|
2020
|
Sheng Shen
Zhewei Yao
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
HAWQV3: Dyadic Neural Network Quantization
|
2020
|
Zhewei Yao
Zhen Dong
Zhangcheng Zheng
Amir Gholami
Jiali Yu
Eric Tan
Leyuan Wang
Qijing Huang
Yida Wang
Michael W. Mahoney
|
+
|
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
|
2020
|
Zhewei Yao
Amir Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
|
+
|
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
|
2019
|
Zhen Dong
Zhewei Yao
Yaohui Cai
Daiyaan Arfeen
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
|
2019
|
Paras Jain
Ajay N. Jain
Aniruddha Nrusimha
Amir Gholami
Pieter Abbeel
Kurt Keutzer
Ion Stoica
Joseph E. Gonzalez
|
+
PDF
Chat
|
HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision
|
2019
|
Zhen Dong
Zhewei Yao
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
PDF
Chat
|
Trust Region Based Adversarial Attack on Neural Networks
|
2019
|
Zhewei Yao
Amir Gholami
Peng Xu
Kurt Keutzer
Michael W. Mahoney
|
+
PDF
Chat
|
Simulation of glioblastoma growth using a 3D multispecies tumor model with mass effect
|
2019
|
Shashank Subramanian
Amir Gholami
George Biros
|
+
|
Inefficiency of K-FAC for Large Batch Size Training
|
2019
|
Linjian Ma
Gabe Montague
Jiayu Ye
Zhewei Yao
Amir Gholami
Kurt Keutzer
Michael W. Mahoney
|
+
|
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
|
2019
|
Zhen Dong
Zhewei Yao
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
|
2019
|
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Zhewei Yao
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
|
PyHessian: Neural Networks Through the Lens of the Hessian
|
2019
|
Zhewei Yao
Amir Gholami
Kurt Keutzer
Michael W. Mahoney
|
+
|
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
|
2019
|
Paras Jain
Ajay N. Jain
Aniruddha Nrusimha
Amir Gholami
Pieter Abbeel
Joseph E. Gonzalez
Kurt Keutzer
Ion Stoica
|
+
PDF
Chat
|
CLAIRE: A Distributed-Memory Solver for Constrained Large Deformation Diffeomorphic Image Registration
|
2019
|
Andreas Mang
Amir Gholami
Christos Davatzikos
George Biros
|
+
|
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
|
2019
|
Zhen Dong
Zhewei Yao
Yaohui Cai
Daiyaan Arfeen
Amir Gholami
Michael W. Mahoney
Kurt Keutzer
|
+
PDF
Chat
|
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks
|
2018
|
Amir Gholami
Ariful Azad
Peter Jin
Kurt Keutzer
Aydın Buluç
|
+
PDF
Chat
|
Co-design of deep neural nets and neural net accelerators for embedded vision applications
|
2018
|
Kiseok Kwon
Alon Amid
Amir Gholami
Bichen Wu
Krste AsanoviÄ
Kurt Keutzer
|
+
PDF
Chat
|
SqueezeNext: Hardware-Aware Neural Network Design
|
2018
|
Amir Gholami
Kiseok Kwon
Bichen Wu
Zizheng Tai
Xiangyu Yue
Peter Jin
Sicheng Zhao
Kurt Keutzer
|
+
|
Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications
|
2018
|
Kiseok Kwon
Alon Amid
Amir Gholami
Bichen Wu
Krste AsanoviÄ
Kurt Keutzer
|
+
|
SqueezeNext: Hardware-Aware Neural Network Design
|
2018
|
Amir Gholami
Kiseok Kwon
Bichen Wu
Zizheng Tai
Xiangyu Yue
Peter Jin
Sicheng Zhao
Kurt Keutzer
|
+
|
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
|
2018
|
Zhewei Yao
Amir Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
|
+
|
Large batch size training of neural networks with adversarial training and second-order information
|
2018
|
Zhewei Yao
Amir Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
|
+
|
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
|
2018
|
Noah Golmant
Nikita Vemuri
Zhewei Yao
Vladimir Feinberg
Amir Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
|
+
|
Parameter Re-Initialization through Cyclical Batch Size Schedules
|
2018
|
Norman Mu
Zhewei Yao
Amir Gholami
Kurt Keutzer
Michael W. Mahoney
|
+
|
Trust Region Based Adversarial Attack on Neural Networks
|
2018
|
Zhewei Yao
Amir Gholami
Peng Xu
Kurt Keutzer
Michael W. Mahoney
|
+
|
SqueezeNext: Hardware-Aware Neural Network Design
|
2018
|
Amir Gholami
Kiseok Kwon
Bichen Wu
Zizheng Tai
Xiangyu Yue
Peter J. Jin
Sicheng Zhao
Kurt Keutzer
|
+
|
Integrated Model and Data Parallelism in Training Neural Networks.
|
2017
|
Amir Gholami
Ariful Azad
Kurt Keutzer
Aydın Buluç
|
+
|
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
|
2017
|
Amir Gholami
Ariful Azad
Peter Jin
Kurt Keutzer
Aydın Buluç
|
+
|
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
|
2017
|
Amir Gholami
Ariful Azad
Peter J. Jin
Kurt Keutzer
Aydın Buluç
|
+
PDF
Chat
|
Distributed-Memory Large Deformation Diffeomorphic 3D Image Registration
|
2016
|
Andreas Mang
Amir Gholami
George Biros
|
+
PDF
Chat
|
FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube
|
2016
|
Amir Gholami
Dhairya Malhotra
Hari Sundar
George Biros
|
+
|
AccFFT: A library for distributed-memory FFT on CPU and GPU architectures
|
2015
|
Amir Gholami
Judith Hill
Dhairya Malhotra
George Biros
|
+
|
AccFFT: A library for distributed-memory FFT on CPU and GPU architectures
|
2015
|
Amir Gholami
Judith Hill
Dhairya Malhotra
George Biros
|