BitDelta: Your Fine-Tune May Only Be Worth One Bit

Type: Preprint

Publication Date: 2024-02-15

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2402.10193

Abstract

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. This interesting finding not only highlights the potential redundancy of information added during fine-tuning, but also has significant implications for the multi-tenant serving and multi-tenant storage of fine-tuned models. By enabling the use of a single high-precision base model accompanied by multiple 1-bit deltas, BitDelta dramatically reduces GPU memory requirements by more than 10x, which can also be translated to enhanced generation latency in multi-tenant settings. We validate BitDelta through experiments across Llama-2 and Mistral model families, and on models up to 70B parameters, showcasing minimal performance degradation over all tested settings.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat HALO: Hadamard-Assisted Lossless Optimization for Efficient Low-Precision LLM Training and Fine-Tuning 2025 Saleh Ashkboos
Mahdi Nikdan
Soroush Tabesh
Roberto L. Castro
Torsten Hoefler
Dan Alistarh
+ PDF Chat LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report 2024 Justin Zhao
Timothy C. Wang
Wael Abid
Geoffrey Angus
Arnav Garg
Jeffery Kinnison
Alex Sherstinsky
Piero Molino
Travis Addair
Devvret Rishi
+ Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models 2023 Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
Ruibo Fan
Rui Guo
Xin Wang
Qiong Luo
Shaohuai Shi
+ PDF Chat Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity 2024 Wentao Guo
Jikai Long
Yimeng Zeng
Zirui Liu
Xinyu Yang
Yide Ran
Jacob R. Gardner
Osbert Bastani
Christopher De
X.P. Yu
+ DeltaZip: Multi-Tenant Language Model Serving via Delta Compression 2023 Xiaozhe Yao
Ana Klimovic
+ PDF Chat ZeRO: Memory optimizations Toward Training Trillion Parameter Models 2020 Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
+ PDF Chat Low-Rank Quantization-Aware Training for LLMs 2024 Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
+ PDF Chat ProTrain: Efficient LLM Training via Memory-Aware Techniques 2024 Hanmei Yang
Jin Zhou
Yao Fu
X. Wang
Ramine Roane
Hui Guan
Tongping Liu
+ FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs 2023 Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Hassan Awadalla
+ FP8-LM: Training FP8 Large Language Models 2023 Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
Ze Liu
Yifan Xiong
Ziyue Yang
Bolin Ni
Jingcheng Hu
+ PDF Chat Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey 2024 Zeyu Han
Chao Gao
Jinyang Liu
Jeff
Zhang
Sai Qian Zhang
+ PDF Chat TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training 2024 Wanchao Liang
Tianyu Liu
Less Wright
Will Constable
Andrew Gu
Chien-Chin Huang
Iris Zhang
Wei Feng
Howard Huang
Junjie Wang
+ PDF Chat BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models 2024 Qijun Luo
Hengxu Yu
Xiao Li
+ Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization 2023 Jeonghoon Kim
Jung Hyun Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
Se Jung Kwon
Dongsoo Lee
+ PDF Chat A Study of Optimizations for Fine-tuning Large Language Models 2024 Arjun Singh
N. Pandey
Anup Shirgaonkar
Pavan Manoj
Vijay Aski
+ PDF Chat CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs 2024 Wei Jie Lv
Xuan Xia
Shengjun Huang
+ PDF Chat MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT 2024 Omkar Thawakar
Ashmal Vayani
Salman Khan
Hisham Cholakal
Rao Muhammad Anwer
Michael Felsberg
Tim Baldwin
Eric P. Xing
Fahad Shahbaz Khan
+ CoLLiE: Collaborative Training of Large Language Models in an Efficient Way 2023 Kai Lv
Shuo Zhang
Tianle Gu
Shuhao Xing
Jiawei Hong
Keyu Chen
Xiaoran Liu
Yuqing Yang
Honglin Guo
Tengxiao Liu
+ Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models 2024 Terry Yue Zhuo
Armel Zebaze
Nitchakarn Suppattarachai
Leandro von Werra
Harm de Vries
Qian Liu
Niklas Muennighoff
+ PDF Chat LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models 2024 Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Yongqiang Ma

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors