Ask a Question

Prefer a chat interface with context about you and your work?

Exploration Versus Exploitation in Reinforcement Learning: A Stochastic Control Approach

Exploration Versus Exploitation in Reinforcement Learning: A Stochastic Control Approach

We consider reinforcement learning (RL) in continuous time and study the problem of achieving the best trade-off between exploration of a black box environment and exploitation of current knowledge. We propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory …