SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Type: Preprint

Publication Date: 2024-06-04

Citations: 1

DOI: https://doi.org/10.48550/arxiv.2406.02214

Abstract

Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Sparse Low-rank Adaptation of Pre-trained Language Models 2023 Ning Ding
Xingtai Lv
Qiaosen Wang
Yulin Chen
Bowen Zhou
Zhiyuan Liu
Maosong Sun
+ Sparse Low-rank Adaptation of Pre-trained Language Models 2023 Ning Ding
Xingtai Lv
Qiaosen Wang
Yulin Chen
Bowen Zhou
Zhiyuan Liu
Maosong Sun
+ PDF Chat Gradient Weight-normalized Low-rank Projection for Efficient LLM Training 2024 Jia-Hong Huang
Yixian Shen
Hongyi Zhu
Stevan Rudinac
Evangelos Kanoulas
+ LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning 2023 Longteng Zhang
Lin Zhang
Shaohuai Shi
Xiaowen Chu
Bo Li
+ PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024 Nadav Benedek
Lior Wolf
+ PDF Chat MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning 2024 Ting Jiang
Shaohan Huang
Shengyue Luo
Zihan Zhang
Haizhen Huang
Furu Wei
Wei‐Wei Deng
Feng Sun
Qi Zhang
Deqing Wang
+ PDF Chat LoRA-Mini : Adaptation Matrices Decomposition and Selective Training 2024 Ayush Singh
Rajdeep Aher
Shivank Garg
+ Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning 2024 Wenhan Xia
Chengwei Qin
Elad Hazan
+ PDF Chat ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers 2024 Junyan Hu
Xue Xiao
Mengqi Zhang
Xiao Chen
Zhaochun Ren
Zhumin Chen
Pengjie Ren
+ DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models 2021 Xuxi Chen
Tianlong Chen
Yu Cheng
Weizhu Chen
Zhangyang Wang
Ahmed Hassan Awadallah
+ PDF Chat Low-Rank Adaptation for Foundation Models: A Comprehensive Review 2024 Menglin Yang
Jialin Chen
Yifei Zhang
Jiahong Liu
Jiasheng Zhang
Q. M. Ma
Himanshu Verma
Qianru Zhang
Min Zhou
Irwin King
+ PDF Chat GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning 2024 Abdessalam Ed-dib
Zhanibek Datbayev
Amine Mohamed Aboussalah
+ PDF Chat Scaling Sparse Fine-Tuning to Large Language Models 2024 Alan Ansell
Ivan Vulić
Hannah Sterz
Anna Korhonen
Edoardo Maria Ponti
+ ReLoRA: High-Rank Training Through Low-Rank Updates 2023 Vladislav Lialin
Namrata Shivagunde
Sherin Muckatira
Anna Rumshisky
+ PDF Chat Flexora: Flexible Low Rank Adaptation for Large Language Models 2024 C.-H. Wei
Yao Shu
Yong He
F. Richard Yu
+ Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices 2023 Bojia Zi
Xianbiao Qi
Lingzhi Wang
Jianan Wang
Kam‐Fai Wong
Lei Zhang
+ Efficient Fine-Tuning of Large Language Models via a Low-Rank Gradient Estimator 2024 Luoming Zhang
Zhenyu Lou
Yangwei Ying
Cheng Yang
Hong Zhou
+ PDF Chat VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks 2024 Li Yang
Shaobo Han
Shihao Ji
+ PDF Chat Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models 2024 Xinxin Liu
Aaron Thomas
Cheng Zhang
Jianyi Cheng
Yiren Zhao
Xitong Gao
+ PDF Chat FineGates: LLMs Finetuning with Compression using Stochastic Gates 2024 Jonathan Svirsky
Yehonathan Refael
Ofir Lindenbaum

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors