Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Type: Preprint

Publication Date: 2024-02-15

Citations: 1

DOI: https://doi.org/10.48550/arxiv.2402.10342

Abstract

Reinforcement Learning from Human Feedback (RLHF) has achieved impressive empirical successes while relying on a small amount of human feedback. However, there is limited theoretical justification for this phenomenon. Additionally, most recent studies focus on value-based algorithms despite the recent empirical successes of policy-based algorithms. In this work, we consider an RLHF algorithm based on policy optimization (PO-RLHF). The algorithm is based on the popular Policy Cover-Policy Gradient (PC-PG) algorithm, which assumes knowledge of the reward function. In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function. We provide performance bounds for PO-RLHF with low query complexity, which provides insight into why a small amount of human feedback may be sufficient to get good performance with RLHF. A key novelty is our trajectory-level elliptical potential analysis technique used to infer reward function parameters when comparison queries rather than reward observations are used. We provide and analyze algorithms in two settings: linear and neural function approximation, PG-RLHF and NN-PG-RLHF, respectively.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning 2020 Alekh Agarwal
Mikael Henaff
Sham M. Kakade
Wen Sun
+ PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning 2020 Alekh Agarwal
Mikael Henaff
Sham M. Kakade
Wen Sun
+ PDF Chat Preference-Guided Reinforcement Learning for Efficient Exploration 2024 Guojian Wang
Faguo Wu
Xiao Zhang
Tianyuan Chen
Xuyang Chen
Lin Zhao
+ Reward-Free Exploration for Reinforcement Learning 2020 Chi Jin
Akshay Krishnamurthy
Max Simchowitz
Tiancheng Yu
+ PDF Chat Reward-Free Exploration for Reinforcement Learning 2020 Chi Jin
Akshay Krishnamurthy
Max Simchowitz
Tiancheng Yu
+ Reward-Free Exploration for Reinforcement Learning 2020 Chi Jin
Akshay Krishnamurthy
Max Simchowitz
Tiancheng Yu
+ Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation 2022 Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
+ Making RL with Preference-based Feedback Efficient via Randomization 2023 Runzhe Wu
Wen Sun
+ Trajectory-Oriented Policy Optimization with Sparse Rewards 2024 Guojian Wang
Faguo Wu
Xiao Zhang
+ Efficient Online Reinforcement Learning with Offline Data 2023 Philip Ball
Laura Smith
Ilya Kostrikov
Sergey Levine
+ Information Directed Reward Learning for Reinforcement Learning 2021 David Lindner
Matteo Turchetta
Sebastian Tschiatschek
Kamil Ciosek
Andreas Krause
+ Human-Inspired Framework to Accelerate Reinforcement Learning 2023 Ali Beikmohammadi
Sindri MagnĂşsson
+ Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces 2019 Guy Lorberbom
Chris J. Maddison
Nicolas Heess
Tamir Hazan
Daniel Tarlow
+ On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game 2021 Shuang Qiu
Jieping Ye
Zhaoran Wang
Zhuoran Yang
+ Information Directed Reward Learning for Reinforcement Learning 2021 David Lindner
Matteo Turchetta
Sebastian Tschiatschek
Kamil Ciosek
Andreas Krause
+ Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces 2019 Guy Lorberbom
Chris J. Maddison
Nicolas Heess
Tamir Hazan
Daniel Tarlow
+ Query-Policy Misalignment in Preference-Based Reinforcement Learning 2023 Xiao Hu
Jianxiong Li
Xianyuan Zhan
Qing‐Shan Jia
Ya-Qin Zhang
+ PDF Chat Reflective Policy Optimization 2024 Yaozhong Gan
Renye Yan
Zhe Wu
Junliang Xing
+ Offline Prioritized Experience Replay 2023 Yue Yang
Bingyi Kang
Xiao Ma
Gao Huang
Shiji Song
Shuicheng Yan
+ Bi-Level Offline Policy Optimization with Limited Exploration 2023 Wenzhuo Zhou

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors