Ask a Question

Prefer a chat interface with context about you and your work?

Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

Anderson (1965) acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it …