Ask a Question

Prefer a chat interface with context about you and your work?

Global Optimality Guarantees for Policy Gradient Methods

Global Optimality Guarantees for Policy Gradient Methods

Policy gradient methods, which have powered a lot of recent success in reinforcement learning, search for an optimal policy in a parameterized policy class by performing stochastic gradient descent on the cumulative expected cost-to-go under some initial state distribution. Although widely used, these methods lack theoretical guarantees as the optimization …