Dynamic Bottleneck for Robust Self-Supervised Exploration

Type: Preprint

Publication Date: 2021-10-20

Citations: 10

Abstract

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Dynamic Bottleneck for Robust Self-Supervised Exploration 2021 Chenjia Bai
Lingxiao Wang
Lei Han
Animesh Garg
Jianye Hao
Peng Liu
Zhaoran Wang
+ Information Maximizing Exploration with a Latent Dynamics Model 2018 Trevor Barron
Oliver Obst
Heni Ben Amor
+ Learning-Driven Exploration for Reinforcement Learning 2019 Muhammad Usama
Dong Eui Chang
+ Learning-Driven Exploration for Reinforcement Learning 2019 Muhammad Usama
Dong Eui Chang
+ DORA The Explorer: Directed Outreaching Reinforcement Action-Selection 2018 Leshem Choshen
Lior Fox
Yonatan Loewenstein
+ PDF Chat Exploring Unknown States with Action Balance 2020 Yan Song
Yingfeng Chen
Yujing Hu
Changjie Fan
+ Exploring Unknown States with Action Balance 2020 Yan Song
Yingfeng Chen
Yujing Hu
Changjie Fan
+ PDF Chat CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning 2024 Chenyu Sun
Hangwei Qian
Chunyan Miao
+ CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning 2023 Chenyu Sun
Hangwei Qian
Chunyan Miao
+ PDF Chat Learning-Driven Exploration for Reinforcement Learning 2021 Muhammad Usama
Dong Eui Chang
+ PDF Chat Effective Exploration Based on the Structural Information Principles 2024 Xianghua Zeng
Hao Peng
Angsheng Li
+ Diverse Exploration for Fast and Safe Policy Improvement 2018 Andrew Cohen
Lei Yu
R.W. Wright
+ Diverse Exploration for Fast and Safe Policy Improvement 2018 Andrew I. Cohen
Yu Lei
Robert Wright
+ PDF Chat Diverse Exploration for Fast and Safe Policy Improvement 2018 Andrew Cohen
Lei Yu
R.W. Wright
+ Adaptive trajectory-constrained exploration strategy for deep reinforcement learning 2023 Guojian Wang
Faguo Wu
Xiao Zhang
Ning Guo
Zhiming Zheng
+ A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning 2023 Siyuan Guo
Yanchao Sun
Jifeng Hu
Sili Huang
Hechang Chen
Haiyin Piao
Lichao Sun
Yi Chang
+ PDF Chat A Temporally Correlated Latent Exploration for Reinforcement Learning 2024 S. B. Oh
WanSoo Kim
Hyunjin Kim
+ Efficient Exploration with Self-Imitation Learning via Trajectory-Conditioned Policy. 2019 Yijie Guo
Jongwook Choi
Marcin Moczulski
Samy Bengio
Mohammad Norouzi
Honglak Lee
+ DQN with model-based exploration: efficient learning on environments with sparse rewards 2019 Stephen Gou
Yuyang Liu
+ PDF Chat Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm 2024 Ting Qiao
Henry Williams
David Valencia
Bruce A. MacDonald