Global Optimality Guarantees for Policy Gradient Methods
Global Optimality Guarantees for Policy Gradient Methods
Policy gradient methods, which have powered a lot of recent success in reinforcement learning, search for an optimal policy in a parameterized policy class by performing stochastic gradient descent on the cumulative expected cost-to-go under some initial state distribution. Although widely used, these methods lack theoretical guarantees as the optimization …