Tree Search-Based Policy Optimization under Stochastic Execution Delay

David Valensi, Esther Derman, Shie Mannor, Gal Dalal

Type: Preprint

Publication Date: 2024-04-08

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2404.05440

View Chat PDF

Abstract

The standard formulation of Markov decision processes (MDPs) assumes that the agent's decisions are executed immediately. However, in numerous realistic applications such as robotics or healthcare, actions are performed with a delay whose value can even be stochastic. In this work, we introduce stochastic delayed execution MDPs, a new formalism addressing random delays without resorting to state augmentation. We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies in order to reach optimal performance, thus extending the deterministic fixed delay case. Armed with this insight, we devise DEZ, a model-based algorithm that optimizes over the class of Markov policies. DEZ leverages Monte-Carlo tree search similar to its non-delayed variant EfficientZero to accurately infer future states from the action queue. Thus, it handles delayed execution while preserving the sample efficiency of EfficientZero. Through a series of experiments on the Atari suite, we demonstrate that although the previous baseline outperforms the naive method in scenarios with constant delay, it underperforms in the face of stochastic delays. In contrast, our approach significantly outperforms the baselines, for both constant and stochastic delays. The code is available at http://github.com/davidva1/Delayed-EZ .

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Acting in Delayed Environments with Non-Stationary Markov Policies	2021	Gal Dalal Esther Derman Shie Mannor
+	Acting in Delayed Environments with Non-Stationary Markov Policies	2021	Esther Derman Gal Dalal Shie Mannor
+ PDF Chat	Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree	2024	Lang Feng Pengjie Gu Bo An Gang Pan
+ PDF Chat	Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays	2021	Somjit Nath Mayank Baranwal Harshad Khadilkar
+ PDF Chat	ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning	2024	Chen-Xiao Gao Chenyang Wu Mingjun Cao Rui Kong Zongzhang Zhang Yang Yu
+	ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning	2023	Chenxiao Gao Chenyang Wu Mingjun Cao Rui Kong Zongzhang Zhang Yang Yu
+	Decision Making in Non-Stationary Environments with Policy-Augmented Search	2024	Ava Pettet Yunuo Zhang Baiting Luo Kyle Hollins Wray Hendrik Baier Áron Lászka Abhishek Dubey Ayan Mukhopadhyay
+ PDF Chat	Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes	2018	Kunal Menda Yi‐Chun Chen Justin Grana James W. Bono Brendan Tracey Mykel J. Kochenderfer David H. Wolpert
+	Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes	2024	Baiting Luo Yunuo Zhang Abhishek Dubey Ayan Mukhopadhyay
+	Anytime Integrated Task and Motion Policies for Stochastic Environments	2019	Naman Shah Deepak Kala Vasudevan K. Kalyan Kumar Pranav Kamojjhala Siddharth Srivastava
+	Anytime Integrated Task and Motion Policies for Stochastic Environments	2019	Naman Shah Deepak Kala Vasudevan K. Kalyan Kumar Pranav Kamojjhala Siddharth Srivastava
+ PDF Chat	Anytime Integrated Task and Motion Policies for Stochastic Environments	2020	Naman Shah Deepak Kala Vasudevan K. Kalyan Kumar Pranav Kamojjhala Siddharth Srivastava
+	Multi-Task Option Learning and Discovery for Stochastic Path Planning	2022	Naman Shah Siddharth Srivastava
+	Anytime Stochastic Task and Motion Policies.	2021	Naman Shah Siddharth Srivastava
+	Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods	2023	Debraj Chakraborty Damien Busatto-Gaston Jean-François Raskin Guillermo A. Pérez
+ PDF Chat	A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees	2024	Marcus Hoerger Hanna Kurniawati Dirk P. Kroese Nan Ye
+	A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees	2023	Marcus Hoerger Hanna Kurniawati Dirk P. Kroese Nan Ye
+	L4KDE: Learning for KinoDynamic Tree Expansion	2022	Tin Lai Weiming Zhi Tucker Hermans Fábio Ramos
+ PDF Chat	Monte Carlo Tree Search with Boltzmann Exploration	2024	Michael K. Painter Mohamed Baioumy Nick Hawes Bruno Lacerda
+	SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search	2023	Gal Dalal Assaf Hallak Gugan Thoppe Shie Mannor Gal Chechik

Cited by (0)

Action	Title	Year	Authors

Citing (0)

Action	Title	Year	Authors