Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence …