Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth
Function Approximation
Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth
Function Approximation
Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to the one with general smooth function approximation. The asymptotic convergence for the on-policy setting with general smooth function approximation …