Observational Overfitting in Reinforcement Learning

Type: Preprint

Publication Date: 2019-12-05

Citations: 9

Abstract

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP). We provide a general framework for analyzing this scenario, which we use to design multiple synthetic benchmarks from only modifying the observation space of an MDP. When an agent overfits to different observation spaces even if the underlying MDP dynamics is fixed, we term this observational overfitting. Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL).

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Observational Overfitting in Reinforcement Learning 2019 Xingyou Song
Yiding Jiang
Yilun Du
Behnam Neyshabur
+ A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning 2018 Amy Zhang
Nicolas Ballas
Joëlle Pineau
+ A Study on Overfitting in Deep Reinforcement Learning 2018 Chiyuan Zhang
Oriol Vinyals
Rémi Munos
Samy Bengio
+ Quantifying Generalization in Reinforcement Learning 2018 Karl Cobbe
О. В. Климов
Chris Hesse
Taehoon Kim
John Schulman
+ Quantifying Generalization in Reinforcement Learning 2018 Karl Cobbe
Oleg Klimov
Chris Hesse
Taehoon Kim
John Schulman
+ The Difficulty of Passive Learning in Deep Reinforcement Learning 2021 Georg Ostrovski
Pablo Samuel Castro
Will Dabney
+ Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? 2023 Gunshi Gupta
Tim G. J. Rudner
Rowan McAllister
Adrien Gaidon
Yarin Gal
+ The Difficulty of Passive Learning in Deep Reinforcement Learning 2021 Georg Ostrovski
Pablo Samuel Castro
Will Dabney
+ The Difficulty of Passive Learning in Deep Reinforcement Learning 2021 Georg Ostrovski
Pablo Samuel Castro
Will Dabney
+ PDF Chat The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret 2024 Lukas Fluri
Leon Lang
Alessandro Abate
Patrick Forré
David Krueger
Joar Skalse
+ Discovering Blind Spots in Reinforcement Learning 2018 Ramya Ramakrishnan
Ece Kamar
Debadeepta Dey
Julie Shah
Eric Horvitz
+ The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models 2022 Alexander Pan
Kush Bhatia
Jacob Steinhardt
+ Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning 2022 Bertrand Charpentier
Ransalu Senanayake
Mykel J. Kochenderfer
Stephan Günnemann
+ Efficient Deep Reinforcement Learning Requires Regulating Overfitting 2023 Qiyang Li
Aviral Kumar
Ilya Kostrikov
Sergey Levine
+ False Correlation Reduction for Offline Reinforcement Learning 2021 Zhi‐Hong Deng
Zuyue Fu
Lingxiao Wang
Zhuoran Yang
Chenjia Bai
Zhaoran Wang
Jing Jiang
+ PDF Chat False Correlation Reduction for Offline Reinforcement Learning 2023 Zhi‐Hong Deng
Zuyue Fu
Lingxiao Wang
Zhuoran Yang
Chenjia Bai
Tianyi Zhou
Zhaoran Wang
Jing Jiang
+ PDF Chat Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning 2024 Michal Nauman
Michał Bortkiewicz
Mateusz Ostaszewski
Piotr Miłoś
T. P. Trzcinski
Marek Cygan
+ The Principle of Unchanged Optimality in Reinforcement Learning Generalization 2019 Alex Irpan
Xingyou Song
+ Conservative Q-Learning for Offline Reinforcement Learning 2020 Aviral Kumar
Aurick Zhou
George Tucker
Sergey Levine
+ Conservative Q-Learning for Offline Reinforcement Learning 2020 Aviral Kumar
Aurick Zhou
George Tucker
Sergey Levine