Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning
Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning
Deep reinforcement learning has demonstrated remarkable achievements across diverse domains such as video games, robotic control, autonomous driving, and drug discovery. Common methodologies in partially-observable domains largely lean on end-to-end learning from high-dimensional observations, such as images, without explicitly reasoning about true state. We suggest an alternative direction, introducing the …