Ask AI a math question

Related Paper

Online Markov Decision Processes With Kullback–Leibler Control Cost

This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the setup of Todorov, the state-action …

Ask a Question