On-line Policy Improvement using Monte-Carlo Search
On-line Policy Improvement using Monte-Carlo Search
We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is …