Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang, Krishnateja Killamsetty, Shivchander Sudalairaj, Weihong Zhao, Seungwook Han, Abhishek Bhandwaldar, Guangxuan Xu, Kai Xu

Type: Preprint

Publication Date: 2024-12-17

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2412.13337

View Publication

Download PDF

Abstract

The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training recommended by Orca. Key insights from our work include: (i) larger batch sizes paired with lower learning rates lead to improved model performance on benchmarks such as MMLU, MTBench, and Open LLM Leaderboard; (ii) early-stage training dynamics, such as lower gradient norms and higher loss values, are strong indicators of better final model performance, enabling early termination of sub-optimal runs and significant computational savings; (iii) through a thorough exploration of hyperparameters like warmup steps and learning rate schedules, we provide guidance for practitioners and find that certain simplifications do not compromise performance; and (iv) we observed no significant difference in performance between phased and stacked training strategies, but stacked training is simpler and more sample efficient. With these findings holding robustly across datasets and models, we hope this study serves as a guide for practitioners fine-tuning small LLMs and promotes a more inclusive environment for LLM research.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+ PDF Chat	Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models	2024	Dheeraj Mekala Alex Nguyen Jingbo Shang
+ PDF Chat	Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications	2024	Alon Halfon Shai Gretz Ofir Arviv Artem Spector Orith Toledo‐Ronen Yoav Katz Liat Ein‐Dor Michal Shmueli-Scheuer Noam Slonim
+	LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models	2023	Yixuan Weng Zhiqi Wang Huanxuan Liao Shizhu He Shengping Liu Kang Liu Jun Zhao
+ PDF Chat	52B to 1T: Lessons Learned via Tele-FLM Series	2024	Xiang Li Yiqun Yao Xin Jiang Xuezhi Fang Chao Wang Xinzhang Liu Zihan Wang Y. B. Zhao Xin Wang Yu‐Yao Huang
+	MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications	2023	Yizhe Yang Huashan Sun Jiawei Li Runheng Liu Yinghao Li Yuhang Liu Heyan Huang Yang Gao
+	Hyperparameter Optimization for Large Language Model Instruction-Tuning	2023	Christophe Tribes Sacha Benarroch-Lelong Peng Lu Ivan Kobyzev
+	FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning	2023	Xinyi Wang John Wieting Jonathan H. Clark
+ PDF Chat	The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities	2024	Venkatesh Balavadhani Parthasarathy Ahtsham Zafar Aafaq Khan Arsalan Shahid
+ PDF Chat	T\"ULU 3: Pushing Frontiers in Open Language Model Post-Training	2024	Nathan Lambert Jacob Morrison Valentina Pyatkin Sheng‐Yi Huang Hamish Ivison Faeze Brahman Lester James V. Miranda Alisa Liu Nouha Dziri Shane Lyu
+ PDF Chat	How to Train BERT with an Academic Budget	2021	Peter Izsak Moshe Berchansky Omer Levy
+	How to Train BERT with an Academic Budget	2021	Peter Izsak Moshe Berchansky Omer Levy
+ PDF Chat	Experience of Training a 1.7B-Parameter LLaMa Model From Scratch	2024	Miles Q. Li Benjamin C. M. Fung Shih‐Chia Huang
+	Fine-Tuning Language Models with Just Forward Passes	2023	Sadhika Malladi Tianyu Gao Eshaan Nichani Alex Damian Jason D. Lee Danqi Chen Sanjeev Arora
+	Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models	2024	Nigel Doering Cyril Gorlla Trevor Tuttle Adhvaith Vijay
+ PDF Chat	HFT: Half Fine-Tuning for Large Language Models	2024	Tingfeng Hui Zhenyu Zhang Shuohuan Wang Weiran Xu Yu Sun Hua Wu
+	How to Train BERT with an Academic Budget	2021	Peter Izsak Moshe Berchansky Omer Levy
+ PDF Chat	SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models	2024	Xudong Lu Aojun Zhou Yuhui Xu Renrui Zhang Peng Gao Hongsheng Li
+ PDF Chat	Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models	2024	James Vo
+ PDF Chat	DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models	2024	Ranchi Zhao Zhen Leng Thai Yifan Zhang Shengding Hu Yunqi Ba Jie Zhou Jie Cai Zhiyuan Liu Maosong Sun
+ PDF Chat	Metadata Conditioning Accelerates Language Model Pre-training	2025	Tianyu Gao Alexander Wettig Luxi He Yihe Dong Sadhika Malladi Danqi Chen

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors