Ask a Question

Prefer a chat interface with context about you and your work?

Sublinear regret for learning POMDPs

Sublinear regret for learning POMDPs

We study the modelā€based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infinite horizon. We propose a learning algorithm for this problem, building on spectral ā€¦