Corruption-Robust Exploration in Episodic Reinforcement Learning
Corruption-Robust Exploration in Episodic Reinforcement Learning
We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based …