Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Given a Markov decision process (MDP) with n states and a total number m of actions, we study the number of iterations needed by policy iteration (PI) algorithms to converge to the optimal γ-discounted policy. We consider two variations of PI: Howard’s PI that changes the actions in all states …