Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Demystifying the Curse of Horizon in Offline Reinforcement Learning in Order to Break It Offline reinforcement learning (RL), where we evaluate and learn new policies using existing off-policy data, is crucial in applications where experimentation is challenging and simulation unreliable, such as medicine. It is also notoriously difficult because the …