Ask AI a math question

Related Paper

Exploration Versus Exploitation in Reinforcement Learning: A Stochastic Control Approach

We consider reinforcement learning (RL) in continuous time and study the problem of achieving the best trade-off between exploration of a black box environment and exploitation of current knowledge. We propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory …

Ask a Question