Learning Value Functions in Deep Policy Gradients using Residual Variance
Learning Value Functions in Deep Policy Gradients using Residual Variance
Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing a different approach for training the critic in the actor-critic framework. Our work builds on …