Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Type: Preprint

Publication Date: 2024-10-14

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2410.11022

View Chat PDF

Abstract

When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents are sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases. We quantify the rate of collapse of these return distributions and exhibit that their statistics collapse at different rates. Moreover, we define distributional perspectives on action gaps and advantages. In particular, we introduce the superiority as a probabilistic generalization of the advantage -- the core object of approaches to mitigating performance issues in high-frequency value-based RL. In addition, we build a superiority-based DRL algorithm. Through simulations in an option-trading domain, we validate that proper modeling of the superiority distribution produces improved controllers at high decision frequencies.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ A Distributional Perspective on Reinforcement Learning 2017 Marc G. Bellemare
Will Dabney
Rémi Munos
+ A Distributional Perspective on Reinforcement Learning 2017 Marc G. Bellemare
Will Dabney
Rémi Munos
+ One-Step Distributional Reinforcement Learning 2023 Mastane Achab
Réda Alami
Yasser Abdelaziz Dahou Djilali
Kirill Fedyanin
Éric Moulines
+ Direct Advantage Estimation 2021 Hsiao-Ru Pan
Nico Gürtler
Alexander Neitz
Bernhard Schölkopf
+ Statistics and Samples in Distributional Reinforcement Learning 2019 Mark Rowland
Robert Dadashi
Saurabh Kumar
Rémi Munos
Marc G. Bellemare
Will Dabney
+ Normality-Guided Distributional Reinforcement Learning for Continuous Control 2022 Ju-Seung Byun
Andrew Perrault
+ Distributional Reinforcement Learning via Moment Matching 2020 Thanh Nguyen-Tang
Sunil Gupta
Svetha Venkatesh
+ PDF Chat Distributional Reinforcement Learning via Moment Matching 2021 Thanh Nguyen-Tang
Sunil Gupta
Svetha Venkatesh
+ An Analysis of Categorical Distributional Reinforcement Learning 2018 Mark Rowland
Marc G. Bellemare
Will Dabney
Rémi Munos
Yee Whye Teh
+ How Does Value Distribution in Distributional Reinforcement Learning Help Optimization? 2022 Ke Sun
Bei Jiang
Linglong Kong
+ Risk Perspective Exploration in Distributional Reinforcement Learning 2022 Jihwan Oh
Joonkee Kim
Se-Young Yun
+ Stochastically Dominant Distributional Reinforcement Learning 2019 John D. Martin
Michal Lyskawinski
Xiaohu Li
Brendan Englot
+ Stochastically Dominant Distributional Reinforcement Learning 2019 John D. Martin
Michal Lyskawinski
Xiaohu Li
Brendan Englot
+ PDF Chat Skill or Luck? Return Decomposition via Advantage Functions 2024 Hsiao-Ru Pan
Bernhard Schölkopf
+ Loss Dynamics of Temporal Difference Reinforcement Learning 2023 Blake Bordelon
Paul Masset
Henry Kuo
Cengiz Pehlevan
+ PDF Chat Distributional Reinforcement Learning With Quantile Regression 2018 Will Dabney
Mark Rowland
Marc G. Bellemare
Rémi Munos
+ Distributional Reinforcement Learning With Quantile Regression 2017 Will Dabney
Mark Rowland
Marc G. Bellemare
Rémi Munos
+ Learning Dynamics and Generalization in Reinforcement Learning 2022 Clare Lyle
Mark Rowland
Will Dabney
Marta Kwiatkowska
Yarin Gal
+ PDF Chat Beyond Expected Return: Accounting for Policy Reproducibility When Evaluating Reinforcement Learning Algorithms 2024 Manon Flageat
Bryan Lim
Antoine Cully
+ Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms 2023 Manon Flageat
Bryan Lim
Antoine Cully

Cited by (0)

Action Title Year Authors

Citing (0)

Action Title Year Authors