Ask a Question

Prefer a chat interface with context about you and your work?

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal $\widetilde{O}(\sqrt{T})$ regret and another computationally efficient variant with $\widetilde{O}(T^{3/4})$ …