Online Markov Decision Processes With Kullback–Leibler Control Cost
Online Markov Decision Processes With Kullback–Leibler Control Cost
This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the setup of Todorov, the state-action …