EVaDE : Event-Based Variational Thompson Sampling for Model-Based
Reinforcement Learning
EVaDE : Event-Based Variational Thompson Sampling for Model-Based
Reinforcement Learning
Posterior Sampling for Reinforcement Learning (PSRL) is a well-known algorithm that augments model-based reinforcement learning (MBRL) algorithms with Thompson sampling. PSRL maintains posterior distributions of the environment transition dynamics and the reward function, which are intractable for tasks with high-dimensional state and action spaces. Recent works show that dropout, used …