Projects
Reading
People
Chat
SU\G
(𝔸)
/K·U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
,
Ching-An Cheng
,
Alekh Agarwal
Type:
Preprint
Publication Date:
2021-03-24
Citations:
7
View Publication
Share
Locations
arXiv (Cornell University) -
View
Similar Works
Action
Title
Year
Authors
+
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
2021
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
+
Randomized Exploration for Reinforcement Learning with General Value Function Approximation
2021
Haque Ishfaq
Qiwen Cui
Việt Dũng Nguyễn
Alex Ayoub
Zhuoran Yang
Zhaoran Wang
Doina Precup
Lin F. Yang
+
Randomized Exploration for Reinforcement Learning with General Value Function Approximation
2021
Haque Ishfaq
Qiwen Cui
Việt Hưng Nguyễn
Alex Ayoub
Zhuoran Yang
Zhaoran Wang
Doina Precup
Lin F. Yang
+
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
2020
Alekh Agarwal
Mikael Henaff
Sham M. Kakade
Wen Sun
+
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
2023
Qinghua Liu
Gellért Weisz
András György
Chi Jin
Csaba Szepesvári
+
Provably Efficient Exploration in Policy Optimization
2019
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
+
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
2020
Alekh Agarwal
Mikael Henaff
Sham M. Kakade
Wen Sun
+
Is Pessimism Provably Efficient for Offline RL
2020
Ying Jin
Zhuoran Yang
Zhaoran Wang
+
Provably Correct Optimization and Exploration with Non-linear Policies
2021
Fei Feng
Wotao Yin
Alekh Agarwal
Lin F. Yang
+
Provably Correct Optimization and Exploration with Non-linear Policies
2021
Fei Feng
Wotao Yin
Alekh Agarwal
Lin F. Yang
+
Efficient iterative policy optimization
2017
Nicolas Le Roux
+
Efficient iterative policy optimization
2016
Nicolas Le Roux
+
Is Pessimism Provably Efficient for Offline RL?
2020
Ying Jin
Zhuoran Yang
Zhaoran Wang
+
Bi-Level Offline Policy Optimization with Limited Exploration
2023
Wenzhuo Zhou
+
PDF
Chat
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
2024
Yihan Du
Anna Winnicki
Gal Dalal
Shie Mannor
Ramakrishnan Srikant
+
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
2021
Han Zhong
Zhuoran Yang
Zhaoran Wang
Csaba Szepesvári
+
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
2023
Han Zhong
Tong Zhang
+
PDF
Chat
Model-Free Active Exploration in Reinforcement Learning
2024
Alessio Russo
Alexandre Proutière
+
Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation
2022
Christoph Dann
Yishay Mansour
Mehryar Mohri
Ayush Sekhari
Karthik Sridharan
+
Towards Tractable Optimism in Model-Based Reinforcement Learning
2020
Aldo Pacchiano
Philip Ball
Jack Parker-Holder
Krzysztof Choromański
Stephen Roberts
Works That Cite This (7)
Action
Title
Year
Authors
+
Corruption-Robust Offline Reinforcement Learning
2021
Xuezhou Zhang
Yiding Chen
Junwei Zhu
W. Sun
+
MADE: Exploration via Maximizing Deviation from Explored Regions
2021
Tianjun Zhang
Paria Rashidinejad
Jiantao Jiao
Yuandong Tian
Joseph E. Gonzalez
Stuart Russell
+
Representation Learning for Online and Offline RL in Low-rank MDPs
2021
Masatoshi Uehara
Xuezhou Zhang
W. Sun
+
On the Global Convergence of Momentum-based Policy Gradient.
2021
Yuhao Ding
Junzi Zhang
Javad Lavaei
+
Navigating to the Best Policy in Markov Decision Processes
2021
Aymen Al Marjani
Aurélien Garivier
Alexandre Proutière
+
Design of Experiments for Stochastic Contextual Linear Bandits
2021
Andrea Zanette
Kefan Dong
Jonathan Lee
Emma Brunskill
+
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
2021
Haipeng Luo
Chen-Yu Wei
Chung‐Wei Lee
Works Cited by This (29)
Action
Title
Year
Authors
+
Contextual Bandit Algorithms with Supervised Learning Guarantees
2010
Alina Beygelzimer
John Langford
Lihong Li
Lev Reyzin
Robert E. Schapire
+
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable
2016
Nan Jiang
Akshay Krishnamurthy
Alekh Agarwal
John Langford
Robert E. Schapire
+
A unified view of entropy-regularized Markov decision processes
2017
Gergely Neu
Anders Jönsson
Vicenç Gómez
+
Proximal Policy Optimization Algorithms
2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
+
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
2018
Maryam Fazel
Rong Ge
Sham M. Kakade
Mehran Mesbahi
+
A Theory of Regularized Markov Decision Processes
2019
Matthieu Geist
Bruno Scherrer
Olivier Pietquin
+
PDF
Chat
Global Optimality Guarantees for Policy Gradient Methods
2024
Jalaj Bhandari
Daniel Russo
+
Is Q-learning Provably Efficient?
2018
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
+
Policy Certificates: Towards Accountable Reinforcement Learning
2018
Christoph Dann
Lihong Li
Wei Wei
Emma Brunskill
+
Fast rates with high probability in exp-concave statistical learning.
2017
Nishant A. Mehta