Entropy-regularized Diffusion Policy with Q-Ensembles for Offline
Reinforcement Learning
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline
Reinforcement Learning
This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL). At the core is a mean-reverting stochastic differential equation (SDE) that transfers a complex action distribution into a standard Gaussian and then samples actions conditioned on the environment state with a corresponding reverse-time SDE, like a …