Ask a Question

Prefer a chat interface with context about you and your work?

Learning Value Functions in Deep Policy Gradients using Residual Variance

Learning Value Functions in Deep Policy Gradients using Residual Variance

Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing a different approach for training the critic in the actor-critic framework. Our work builds on …