Ask a Question

Prefer a chat interface with context about you and your work?

Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning

Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning

This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL). At the core is a mean-reverting stochastic differential equation (SDE) that transfers a complex action distribution into a standard Gaussian and then samples actions conditioned on the environment state with a corresponding reverse-time SDE, like a …