The Fallacy of Minimizing Local Regret in the Sequential Task Setting
The Fallacy of Minimizing Local Regret in the Sequential Task Setting
In the realm of Reinforcement Learning (RL), online RL is often conceptualized as an optimization problem, where an algorithm interacts with an unknown environment to minimize cumulative regret. In a stationary setting, strong theoretical guarantees, like a sublinear ($\sqrt{T}$) regret bound, can be obtained, which typically implies the convergence to …