Ask a Question

Prefer a chat interface with context about you and your work?

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov decision process (MDP) with n states and a total number m of actions, we study the number of iterations needed by policy iteration (PI) algorithms to converge to the optimal γ-discounted policy. We consider two variations of PI: Howard’s PI that changes the actions in all states …