Prefer a chat interface with context about you and your work?
Ordinary Differential Equation Methods for Markov Decision Processes and Application to Kullback--Leibler Control Cost
A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a scalar $\zeta$ that appears in the one-step reward function. For an MDP with $d$ states, the family …