Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

Type: Article

Publication Date: 2022-12-06

Citations: 3

DOI: https://doi.org/10.1109/cdc51059.2022.9992837

Abstract

We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual variables are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate O(1/K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/6</sup> ), where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2022 IEEE 61st Conference on Decision and Control (CDC) - View

Similar Works

Action Title Year Authors
+ Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes 2021 Sihan Zeng
Thinh T. Doan
Justin Romberg
+ Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs 2022 Dong-Sheng Ding
Kaiqing Zhang
Jiali Duan
Tamer Başar
Mihailo R. Jovanović
+ Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm 2022 Qinbo Bai
Amrit Singh Bedi
Vaneet Aggarwal
+ PDF Chat Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm 2023 Qinbo Bai
Amrit Singh Bedi
Vaneet Aggarwal
+ Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms 2023 Prashansa Panda
Shalabh Bhatnagar
+ Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs 2023 Dong-Sheng Ding
Chen-Yu Wei
Kaiqing Zhang
Alejandro Ribeiro
+ PDF Chat A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees 2024 Toshinori Kitamura
Tadashi Kozuno
Masahiro Kato
Yuki Ichihara
Soichiro Nishimori
Akiyoshi Sannai
Sho Sonoda
Wataru Kumagai
Yutaka Matsuo
+ PDF Chat A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes 2023 Sami Khairy
Prasanna Balaprakash
Lin X. Cai
+ Policy Optimization for Constrained MDPs with Provable Fast Global Convergence 2021 Tao Liu
Ruida Zhou
Dileep Kalathil
P. R. Kumar
Chao Tian
+ A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP 2022 Fan Chen
Junyu Zhang
Zaiwen Wen
+ PDF Chat A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning 2024 Sihan Zeng
Thinh T. Doan
Justin Romberg
+ Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process 2021 Tianjiao Li
Ziwei Guan
Shaofeng Zou
Tengyu Xu
Yingbin Liang
Guanghui Lan
+ A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic 2020 Mingyi Hong
Hoi To Wai
Zhaoran Wang
Zhuoran Yang
+ Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning 2016 Yi‐Chen Chen
Mengdi Wang
+ Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes 2022 Donghao Ying
Mengzi Amy Guo
Yuhao Ding
Javad Lavaei
Zuo-Jun
Zuo‐Jun Max Shen
+ A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning 2021 Sihan Zeng
Thinh T. Doan
Justin Romberg
+ PDF Chat Sample-Efficient Constrained Reinforcement Learning with General Parameterization 2024 Washim Uddin Mondal
Vaneet Aggarwal
+ Algorithm for Constrained Markov Decision Process with Linear Convergence 2022 Egor Gladin
Maksim Lavrik-Karmazin
Karina Eduardovna Zainullina
Varvara D Rudenko
Alexander Gasnikov
Martin Takáč
+ Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality 2023 Hyeong Soo Chang
+ Interior Point Constrained Reinforcement Learning with Global Convergence Guarantees 2023 Tingting Ni
Maryam Kamgarpour