Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Type: Preprint

Publication Date: 2024-07-22

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2407.15762

Abstract

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP can learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through an extensive set of experiments and ablations, we show that the CLP framework learns steerable models that outperform and Pareto-dominate the current state-of-the-art approaches for multi-objective finetuning.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models 2024 Wenxuan Zhang
Philip H. S. Torr
Mohamed Elhoseiny
Adel Bibi
+ PDF Chat Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment 2024 Rui Yang
Xiaoman Pan
Feng Luo
Shuang Qiu
Han Zhong
Yu Dong
Jianshu Chen
+ PDF Chat Matryoshka: Learning to Drive Black-Box LLMs with LLMs 2024 Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. L. Dai
Chao Zhang
Bo Dai
+ PDF Chat Fine-Tuning Language Models with Reward Learning on Policy 2024 Lang Hao
Fei Huang
Yongbin Li
+ Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features 2023 Diogo Cruz
Edoardo Pona
Alex Holness-Tofts
Elias Schmied
Victor Alonso
Charlie Griffin
Bogdan-Ionut Cîrstea
+ Aligning Large Language Models with Human Preferences through Representation Engineering 2023 Wenhao Liu
Xiaohua Wang
Muling Wu
Tianlong Li
Changze Lv
Zixuan Ling
Jianhao Zhu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
+ PDF Chat Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards 2024 Haoxiang Wang
Yong Lin
Wei Xiong
Ruizhao Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
+ PDF Chat A Comprehensive Survey of Datasets, Theories, Variants, and Applications in Direct Preference Optimization 2024 Wenyi Xiao
Zhenning Wang
Leilei Gan
Shuai Zhao
Wanggui He
Luu Anh Tuan
Long Chen
Hao Jiang
Zhou Zhao
Fei Wu
+ PDF Chat Direct Alignment of Language Models via Quality-Aware Self-Refinement 2024 Runsheng Yu
Yong Wang
Xiaoqi Jiao
Youzhi Zhang
James T. Kwok
+ PDF Chat Reinforcement Learning Enhanced LLMs: A Survey 2024 Shuhe Wang
Shengyu Zhang
Jie Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
Guoyin Wang
Eduard Hovy
+ Fine-Tuning Language Models with Advantage-Induced Policy Alignment 2023 Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
+ Vanishing Gradients in Reinforcement Finetuning of Language Models 2023 Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Joshua M. Susskind
Etai Littwin
+ Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models 2024 Zixiang Chen
Yihe Deng
Huizhuo Yuan
Kaixuan Ji
Quanquan Gu
+ PDF Chat Enhancing LLM Safety via Constrained Direct Preference Optimization 2024 Zixuan Liu
Xiaolin Sun
Zizhan Zheng
+ PDF Chat LongReward: Improving Long-context Large Language Models with AI Feedback 2024 Jiajie Zhang
Zhongni Hou
Xin Lv
Shulin Cao
Zhenyu Hou
Yilin Niu
Lei Hou
Yuxiao Dong
Ling Feng
Juanzi Li
+ PDF Chat Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives 2024 Anirudhan Badrinath
Prabhat Agarwal
Jiajing Xu
+ PDF Chat Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data 2024 Han Xia
Songyang Gao
Qiming Ge
Zhiheng Xi
Qi Zhang
Xuanjing Huang
+ PDF Chat MetaAligner: Conditional Weak-to-Strong Correction for Generalizable Multi-Objective Alignment of Language Models 2024 Kailai Yang
Zhiwei Liu
Qianqian Xie
Tianlin Zhang
Nirui Song
Jimin Huang
Ziyan Kuang
Sophia Ananiadou
+ PDF Chat SCULPT: Systematic Tuning of Long Prompts 2024 Shanu Kumar
Akhila Yesantarao Venkata
S K Khandelwal
Bishal Santra
Pavan Kumar Agrawal
Manish Gupta
+ PDF Chat Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization 2024 Subhojyoti Mukherjee
Anusha Lalitha
Sailik Sengupta
Aniket Anand Deshmukh
Branislav Kveton

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors