Exploration Versus Exploitation in Reinforcement Learning: A Stochastic Control Approach
Exploration Versus Exploitation in Reinforcement Learning: A Stochastic Control Approach
We consider reinforcement learning (RL) in continuous time and study the problem of achieving the best trade-off between exploration of a black box environment and exploitation of current knowledge. We propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory …