Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Type: Preprint

Publication Date: 2024-12-17

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2412.13337

Abstract

The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training recommended by Orca. Key insights from our work include: (i) larger batch sizes paired with lower learning rates lead to improved model performance on benchmarks such as MMLU, MTBench, and Open LLM Leaderboard; (ii) early-stage training dynamics, such as lower gradient norms and higher loss values, are strong indicators of better final model performance, enabling early termination of sub-optimal runs and significant computational savings; (iii) through a thorough exploration of hyperparameters like warmup steps and learning rate schedules, we provide guidance for practitioners and find that certain simplifications do not compromise performance; and (iv) we observed no significant difference in performance between phased and stacked training strategies, but stacked training is simpler and more sample efficient. With these findings holding robustly across datasets and models, we hope this study serves as a guide for practitioners fine-tuning small LLMs and promotes a more inclusive environment for LLM research.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models 2024 Dheeraj Mekala
Alex Nguyen
Jingbo Shang
+ PDF Chat Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications 2024 Alon Halfon
Shai Gretz
Ofir Arviv
Artem Spector
Orith Toledo‐Ronen
Yoav Katz
Liat Ein‐Dor
Michal Shmueli-Scheuer
Noam Slonim
+ LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models 2023 Yixuan Weng
Zhiqi Wang
Huanxuan Liao
Shizhu He
Shengping Liu
Kang Liu
Jun Zhao
+ PDF Chat 52B to 1T: Lessons Learned via Tele-FLM Series 2024 Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
Xinzhang Liu
Zihan Wang
Y. B. Zhao
Xin Wang
Yu‐Yao Huang
+ MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications 2023 Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
+ Hyperparameter Optimization for Large Language Model Instruction-Tuning 2023 Christophe Tribes
Sacha Benarroch-Lelong
Peng Lu
Ivan Kobyzev
+ FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning 2023 Xinyi Wang
John Wieting
Jonathan H. Clark
+ PDF Chat The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities 2024 Venkatesh Balavadhani Parthasarathy
Ahtsham Zafar
Aafaq Khan
Arsalan Shahid
+ PDF Chat T\"ULU 3: Pushing Frontiers in Open Language Model Post-Training 2024 Nathan Lambert
Jacob Morrison
Valentina Pyatkin
Sheng‐Yi Huang
Hamish Ivison
Faeze Brahman
Lester James V. Miranda
Alisa Liu
Nouha Dziri
Shane Lyu
+ PDF Chat How to Train BERT with an Academic Budget 2021 Peter Izsak
Moshe Berchansky
Omer Levy
+ How to Train BERT with an Academic Budget 2021 Peter Izsak
Moshe Berchansky
Omer Levy
+ PDF Chat Experience of Training a 1.7B-Parameter LLaMa Model From Scratch 2024 Miles Q. Li
Benjamin C. M. Fung
Shih‐Chia Huang
+ Fine-Tuning Language Models with Just Forward Passes 2023 Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alex Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
+ Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models 2024 Nigel Doering
Cyril Gorlla
Trevor Tuttle
Adhvaith Vijay
+ PDF Chat HFT: Half Fine-Tuning for Large Language Models 2024 Tingfeng Hui
Zhenyu Zhang
Shuohuan Wang
Weiran Xu
Yu Sun
Hua Wu
+ How to Train BERT with an Academic Budget 2021 Peter Izsak
Moshe Berchansky
Omer Levy
+ PDF Chat SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models 2024 Xudong Lu
Aojun Zhou
Yuhui Xu
Renrui Zhang
Peng Gao
Hongsheng Li
+ PDF Chat Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models 2024 James Vo
+ PDF Chat DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models 2024 Ranchi Zhao
Zhen Leng Thai
Yifan Zhang
Shengding Hu
Yunqi Ba
Jie Zhou
Jie Cai
Zhiyuan Liu
Maosong Sun
+ PDF Chat Metadata Conditioning Accelerates Language Model Pre-training 2025 Tianyu Gao
Alexander Wettig
Luxi He
Yihe Dong
Sadhika Malladi
Danqi Chen

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors