Do Language Models Have Beliefs? Methods for Detecting, Updating, and
Visualizing Model Beliefs
Do Language Models Have Beliefs? Methods for Detecting, Updating, and
Visualizing Model Beliefs
Do language models have beliefs about the world? Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we …