Tree Search-Based Policy Optimization under Stochastic Execution Delay

Type: Preprint

Publication Date: 2024-04-08

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2404.05440

View Chat PDF

Abstract

The standard formulation of Markov decision processes (MDPs) assumes that the agent's decisions are executed immediately. However, in numerous realistic applications such as robotics or healthcare, actions are performed with a delay whose value can even be stochastic. In this work, we introduce stochastic delayed execution MDPs, a new formalism addressing random delays without resorting to state augmentation. We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies in order to reach optimal performance, thus extending the deterministic fixed delay case. Armed with this insight, we devise DEZ, a model-based algorithm that optimizes over the class of Markov policies. DEZ leverages Monte-Carlo tree search similar to its non-delayed variant EfficientZero to accurately infer future states from the action queue. Thus, it handles delayed execution while preserving the sample efficiency of EfficientZero. Through a series of experiments on the Atari suite, we demonstrate that although the previous baseline outperforms the naive method in scenarios with constant delay, it underperforms in the face of stochastic delays. In contrast, our approach significantly outperforms the baselines, for both constant and stochastic delays. The code is available at http://github.com/davidva1/Delayed-EZ .

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Acting in Delayed Environments with Non-Stationary Markov Policies 2021 Gal Dalal
Esther Derman
Shie Mannor
+ Acting in Delayed Environments with Non-Stationary Markov Policies 2021 Esther Derman
Gal Dalal
Shie Mannor
+ PDF Chat Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree 2024 Lang Feng
Pengjie Gu
Bo An
Gang Pan
+ PDF Chat Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays 2021 Somjit Nath
Mayank Baranwal
Harshad Khadilkar
+ PDF Chat ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning 2024 Chen-Xiao Gao
Chenyang Wu
Mingjun Cao
Rui Kong
Zongzhang Zhang
Yang Yu
+ ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning 2023 Chenxiao Gao
Chenyang Wu
Mingjun Cao
Rui Kong
Zongzhang Zhang
Yang Yu
+ Decision Making in Non-Stationary Environments with Policy-Augmented Search 2024 Ava Pettet
Yunuo Zhang
Baiting Luo
Kyle Hollins Wray
Hendrik Baier
Áron Lászka
Abhishek Dubey
Ayan Mukhopadhyay
+ PDF Chat Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes 2018 Kunal Menda
Yi‐Chun Chen
Justin Grana
James W. Bono
Brendan Tracey
Mykel J. Kochenderfer
David H. Wolpert
+ Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes 2024 Baiting Luo
Yunuo Zhang
Abhishek Dubey
Ayan Mukhopadhyay
+ Anytime Integrated Task and Motion Policies for Stochastic Environments 2019 Naman Shah
Deepak Kala Vasudevan
K. Kalyan Kumar
Pranav Kamojjhala
Siddharth Srivastava
+ Anytime Integrated Task and Motion Policies for Stochastic Environments 2019 Naman Shah
Deepak Kala Vasudevan
K. Kalyan Kumar
Pranav Kamojjhala
Siddharth Srivastava
+ PDF Chat Anytime Integrated Task and Motion Policies for Stochastic Environments 2020 Naman Shah
Deepak Kala Vasudevan
K. Kalyan Kumar
Pranav Kamojjhala
Siddharth Srivastava
+ Multi-Task Option Learning and Discovery for Stochastic Path Planning 2022 Naman Shah
Siddharth Srivastava
+ Anytime Stochastic Task and Motion Policies. 2021 Naman Shah
Siddharth Srivastava
+ Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods 2023 Debraj Chakraborty
Damien Busatto-Gaston
Jean-François Raskin
Guillermo A. Pérez
+ PDF Chat A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees 2024 Marcus Hoerger
Hanna Kurniawati
Dirk P. Kroese
Nan Ye
+ A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees 2023 Marcus Hoerger
Hanna Kurniawati
Dirk P. Kroese
Nan Ye
+ L4KDE: Learning for KinoDynamic Tree Expansion 2022 Tin Lai
Weiming Zhi
Tucker Hermans
Fábio Ramos
+ PDF Chat Monte Carlo Tree Search with Boltzmann Exploration 2024 Michael K. Painter
Mohamed Baioumy
Nick Hawes
Bruno Lacerda
+ SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search 2023 Gal Dalal
Assaf Hallak
Gugan Thoppe
Shie Mannor
Gal Chechik

Cited by (0)

Action Title Year Authors

Citing (0)

Action Title Year Authors