Ask a Question

Prefer a chat interface with context about you and your work?

Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback

Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback

We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an …