Ask AI a math question

Related Paper

Global Optimality Guarantees for Policy Gradient Methods

Policy gradient methods, which have powered a lot of recent success in reinforcement learning, search for an optimal policy in a parameterized policy class by performing stochastic gradient descent on the cumulative expected cost-to-go under some initial state distribution. Although widely used, these methods lack theoretical guarantees as the optimization …

Ask a Question