The Value-Improvement Path: Towards Better Representations for Reinforcement Learning
The Value-Improvement Path: Towards Better Representations for Reinforcement Learning
In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take …