Ask AI a math question

Related Paper

Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback

We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an …

Ask a Question