Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Type: Preprint

Publication Date: 2023-01-01

Citations: 7

DOI: https://doi.org/10.48550/arxiv.2312.15234

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Efficient Large Language Models: A Survey 2023 Zhongwei Wan
Xin Wang
Liu Che
Samiul Alam
Yu Zheng
Zhongnan Qu
Shen Yan
Yi Zhu
Quanlu Zhang
Mosharaf Chowdhury
+ The Efficiency Spectrum of Large Language Models: An Algorithmic Survey 2023 Tianyu Ding
Tianyi Chen
Haidong Zhu
J. Z. Jiang
Yiqi Zhong
Jinā€Xin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
+ PDF Chat Do Generative Large Language Models Need Billions of Parameters? 2024 Sia Gholami
+ Do Generative Large Language Models need billions of parameters? 2023 Sia Gholami
Marwan Omar
+ PDF Chat Towards Pareto Optimal Throughput in Small Language Model Serving 2024 Pol G. Recasens
Yue Zhu
Chen Wang
Eun Kyung Lee
Olivier Tardieu
Alaa Youssef
Jordi Torres
Josep Ll. Berral
+ Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models 2024 Guangji Bai
Zheng Chai
Ling Chen
Shiyu Wang
Jiaying Lu
Nan Zhang
Tingwei Shi
Ziyang Yu
Mengdan Zhu
Yifei Zhang
+ PDF Chat AI Safety in Generative AI Large Language Models: A Survey 2024 Jaymari Chua
Yun Li
Shiyi Yang
Chen Wang
Lina Yao
+ PDF Chat Exploring the landscape of large language models: Foundations, techniques, and challenges 2024 Milad Moradi
Ke Yan
David B. Colwell
Matthias Samwald
Rhona Asgari
+ PDF Chat Small Language Models (SLMs) Can Still Pack a Punch: A survey 2025 Shreyas Subramanian
Vikram Elango
Mecit Gungor
+ PDF Chat Small Language Models: Survey, Measurements, and Insights 2024 Zhichun Lu
Xiang Li
Daoping Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
+ PDF Chat Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives 2024 Desta Haileselassie Hagos
Rick Battle
Danda B. Rawat
+ PDF Chat LLM Inference Serving: Survey of Recent Advances and Opportunities 2024 Baolin Li
Yankai Jiang
Vijay Gadepally
Devesh Tiwari
+ PDF Chat A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models 2024 Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Aman Chadha
+ PDF Chat What is the Role of Small Models in the LLM Era: A Survey 2024 Lihu Chen
Gaƫl Varoquaux
+ PDF Chat Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) 2024 Krishnaram Kenthapadi
Mehrnoosh Sameki
Ankur Taly
+ PDF Chat Adaptive Draft-Verification for Efficient Large Language Model Decoding 2024 Xukun Liu
Bowen Lei
Ruqi Zhang
Dongkuan Xu
+ PDF Chat InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management 2024 Wonbeom Lee
Jungi Lee
Jung-Hwan Seo
Jaewoong Sim
+ PDF Chat Large Language Models: A Survey 2024 Shervin Minaee
Tomas Mikolov
Narjes Nikzad-Khasmakhi
Meysam Chenaghlu
Richard Socher
Xavier Amatriain
Jianfeng Gao
+ PDF Chat Exploring Advanced Large Language Models with LLMsuite 2024 Giorgio Roffo
+ PDF Chat Challenges and Responses in the Practice of Large Language Models 2024 Hongjun Zhu

Works Cited by This (0)

Action Title Year Authors