Ask a Question

Prefer a chat interface with context about you and your work?

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to the one with general smooth function approximation. The asymptotic convergence for the on-policy setting with general smooth function approximation …