Ask a Question

Prefer a chat interface with context about you and your work?

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates …