Action Gaps and Advantages in Continuous-Time Distributional
Reinforcement Learning
Action Gaps and Advantages in Continuous-Time Distributional
Reinforcement Learning
When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents are sensitive …