The Sample Complexity of Teaching by Reinforcement on Q-Learning

Type: Article

Publication Date: 2021-05-18

Citations: 2

DOI: https://doi.org/10.1609/aaai.v35i12.17306

View Chat PDF

Abstract

We study the sample complexity of teaching, termed as ``teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm, where the teacher guides the student through rewards. This is distinct from the teaching-by-demonstration paradigm motivated by robotics applications, where the teacher teaches by providing demonstrations of state/action trajectories. The teaching-by-reinforcement paradigm applies to a wider range of real-world settings where a demonstration is inconvenient, but has not been studied systematically. In this paper, we focus on a specific family of reinforcement learning algorithms, Q-learning, and characterize the TDim under different teachers with varying control power over the environment, and present matching optimal teaching algorithms. Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results. Our teaching algorithms have the potential to speed up RL agent learning in applications where a helpful teacher is available.

Locations

  • Proceedings of the AAAI Conference on Artificial Intelligence - View - PDF
  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ The Sample Complexity of Teaching-by-Reinforcement on Q-Learning 2020 Xuezhou Zhang
Shubham Bharti
Yuzhe Ma
Adish Singla
Xiaojin Zhu
+ The Teaching Dimension of Q-learning 2020 Xuezhou Zhang
Shubham Bharti
Yuzhe Ma
Adish Singla
Xiaojin Zhu
+ PAC Reinforcement Learning without Real-World Feedback 2019 Yuren Zhong
Aniket Anand Deshmukh
Clayton Scott
+ PDF Chat Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis 2023 Gen Li
Changxiao Cai
Yuxin Chen
Yuting Wei
Yuejie Chi
+ Efficient Online Reinforcement Learning with Offline Data 2023 Philip Ball
Laura Smith
Ilya Kostrikov
Sergey Levine
+ Jump-Start Reinforcement Learning 2022 Ikechukwu Uchendu
Ted Xiao
Yao Lu
Banghua Zhu
Mengyuan Yan
Joséphine Simon
Matthew Bennice
Chuyuan Fu
Cong Ma
Jiantao Jiao
+ Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity 2022 Abhishek Gupta
Aldo Pacchiano
Yuexiang Zhai
Sham M. Kakade
Sergey Levine
+ Demonstration-Regularized RL 2023 Daniil Tiapkin
Denis Belomestny
Daniele Calandriello
Éric Moulines
Alexey Naumov
Pierre Perrault
Michal Vaľko
Pierre Ménard
+ Sequence Modeling is a Robust Contender for Offline Reinforcement Learning 2023 Prajjwal Bhargava
Rohan Chitnis
Alborz Geramifard
Shagun Sodhani
Amy Zhang
+ Understanding the Complexity Gains of Single-Task RL with a Curriculum 2022 Qiyang Li
Yuexiang Zhai
Yi Ma
Sergey Levine
+ Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning 2023 Dingwen Kong
Lin F. Yang
+ Offline Reinforcement Learning at Multiple Frequencies 2022 Kaylee Burns
Tianhe Yu
Chelsea Finn
Karol Hausman
+ Task-agnostic Exploration in Reinforcement Learning 2020 Xuezhou Zhang
Yuzhe Ma
Adish Singla
+ Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? 2023 Lei Zhao
Mengdi Wang
Yu Bai
+ Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications 2018 Daniel S. Brown
Scott Niekum
+ Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications 2018 Daniel S. Brown
Scott Niekum
+ Autonomous Reinforcement Learning via Subgoal Curricula 2021 Archit Sharma
Abhishek Gupta
Sergey Levine
Karol Hausman
Chelsea Finn
+ Autonomous Reinforcement Learning via Subgoal Curricula. 2021 Archit Sharma
Abhishek Gupta
Sergey Levine
Karol Hausman
Chelsea Finn
+ Guarded Policy Optimization with Imperfect Online Demonstrations 2023 Zhenghai Xue
Zhenghao Peng
Quanyi Li
Zhihan Liu
Bolei Zhou
+ Persistent Reinforcement Learning via Subgoal Curricula 2021 Archit Sharma
Abhishek Gupta
Sergey Levine
Karol Hausman
Chelsea Finn

Citing (22)

Action Title Year Authors
+ Dynamic Teaching in Sequential Decision Making Environments 2012 Thomas J. Walsh
Sergiu Goschin
+ PDF Chat Informing sequential clinical decision-making through reinforcement learning: an empirical study 2010 Susan M. Shortreed
Eric B. Laber
Daniel J. Lizotte
T. Scott Stroup
Joëlle Pineau
Susan A. Murphy
+ Iterative Machine Teaching 2017 Weiyang Liu
Bo Dai
Ahmad Humayun
Charlene Tay
Yu Chen
Linda B. Smith
James M. Rehg
Le Song
+ An Overview of Machine Teaching 2018 Xiaojin Zhu
Adish Singla
Sandra Zilles
Anna N. Rafferty
+ Data Poisoning Attacks against Online Learning 2018 Yizhen Wang
Kamalika Chaudhuri
+ Online Data Poisoning Attack 2019 Xuezhou Zhang
Xiaojin Zhu
Laurent Lessard
+ Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction 2017 Wen Sun
Arun Venkatraman
Geoffrey J. Gordon
Byron Boots
J. Andrew Bagnell
+ Adversarial Attacks on Stochastic Bandits 2018 Kwang-Sung Jun
Lihong Li
Yuzhe Ma
Xiaojin Zhu
+ Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners 2018 Yuxin Chen
Adish Singla
Oisin Mac Aodha
Pietro Perona
Yisong Yue
+ PDF Chat Data Poisoning Attacks in Contextual Bandits 2018 Yuzhe Ma
Kwang-Sung Jun
Lihong Li
Xiaojin Zhu
+ Is Q-learning Provably Efficient? 2018 Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
+ An Optimal Control Approach to Sequential Machine Teaching 2018 Laurent Lessard
Xuezhou Zhang
Xiaojin Zhu
+ PDF Chat Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications 2019 Daniel S. Brown
Scott Niekum
+ Interactive Teaching Algorithms for Inverse Reinforcement Learning 2019 Parameswaran Kamalaruban
Rati Devidze
Volkan Cevher
Adish Singla
+ Teaching Multiple Concepts to a Forgetful Learner 2018 Anette Hunziker
Yuxin Chen
Oisin Mac Aodha
Manuel Gomez-Rodriguez
Andreas Krause
Pietro Perona
Yisong Yue
Adish Singla
+ Policy Poisoning in Batch Reinforcement Learning and Control 2019 Yuzhe Ma
Xuezhou Zhang
Wen Sun
Xiaojin Zhu
+ Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models 2019 Farnam Mansouri
Yuxin Chen
Ara Vartanian
Xiaojin Zhu
Adish Singla
+ Adaptive Reward-Poisoning Attacks against Reinforcement Learning 2020 Xuezhou Zhang
Yuzhe Ma
Adish Singla
Junwei Zhu
+ Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning 2020 Amin Rakhsha
Goran Radanović
Rati Devidze
Junwei Zhu
Adish Singla
+ Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners 2020 Yun-Shiuan Chuang
Xuezhou Zhang
Yuzhe Ma
Mark K. Ho
Joseph L. Austerweil
Xiaojin Zhu
+ Toward the Fundamental Limits of Imitation Learning 2020 Nived Rajaraman
Lin F. Yang
Jiantao Jiao
Ramachandran Kannan
+ Machine Teaching of Active Sequential Learners 2018 Tomi Peltola
Mustafa Mert Çelikok
Pedram Daee
Samuel Kaski