Gradient Temporal Difference with Momentum: Stability and Convergence
Gradient Temporal Difference with Momentum: Stability and Convergence
Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy ball momentum term and provide choice of step size and momentum parameter that ensures almost sure convergence of …