Ask a Question

Prefer a chat interface with context about you and your work?

The Fallacy of Minimizing Local Regret in the Sequential Task Setting

The Fallacy of Minimizing Local Regret in the Sequential Task Setting

In the realm of Reinforcement Learning (RL), online RL is often conceptualized as an optimization problem, where an algorithm interacts with an unknown environment to minimize cumulative regret. In a stationary setting, strong theoretical guarantees, like a sublinear ($\sqrt{T}$) regret bound, can be obtained, which typically implies the convergence to …