Controlling Large Language Model Agents with Entropic Activation Steering

Nate Rahn, Pierluca D’Oro, Marc G. Bellemare

Type: Preprint

Publication Date: 2024-05-31

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2406.00244

View Chat PDF

Abstract

The generality of pretrained large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. To be successful, such agents must form beliefs about how to achieve their goals based on limited interaction with their environment, resulting in uncertainty about the best action to take at each step. In this paper, we study how LLM agents form and act on these beliefs by conducting experiments in controlled sequential decision-making tasks. To begin, we find that LLM agents are overconfident: They draw strong conclusions about what to do based on insufficient evidence, resulting in inadequately explorative behavior. We dig deeper into this phenomenon and show how it emerges from a collapse in the entropy of the action distribution implied by sampling from the LLM. We then demonstrate that existing token-level sampling techniques are by themselves insufficient to make the agent explore more. Motivated by this fact, we introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. EAST computes a steering vector as an entropy-weighted combination of representations, and uses it to manipulate an LLM agent's uncertainty over actions by intervening on its activations during the forward pass. We show that EAST can reliably increase the entropy in an LLM agent's actions, causing more explorative behavior to emerge. Finally, EAST modifies the subjective uncertainty an LLM agent expresses, paving the way to interpreting and controlling how LLM agents represent uncertainty about their decisions.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+ PDF Chat	Can large language models explore in-context?	2024	Akshay Krishnamurthy Keegan Harris Dylan J. Foster Cyril Zhang Aleksandrs Slivkins
+ PDF Chat	Entropy-Regularized Token-Level Policy Optimization for Large Language Models	2024	Muning Wen Cheng Deng Jun Wang Weinan Zhang Ying Wen
+ PDF Chat	Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning	2024	Yao-Hung Hubert Tsai Walter Talbott Jian Zhang
+	Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models	2023	Wanqiao Xu Shi Dong Dilip Arumugam Benjamin Van Roy
+ PDF Chat	Soft Self-Consistency Improves Language Model Agents	2024	Han Wang Archiki Prasad Elias Stengel-Eskin Mohit Bansal
+ PDF Chat	Multi-property Steering of Large Language Models with Dynamic Activation Composition	2024	Daniel Scalena Gabriele Sarti Malvina Nissim
+ PDF Chat	Reinforcing Language Agents via Policy Optimization with Action Decomposition	2024	Muning Wen Ziyu Wan Weinan Zhang Jun Wang Ying Wen
+ PDF Chat	Disentangling Exploration of Large Language Models by Optimal Exploitation	2025	Tim Grams Patrick Betz Christian Bartelt
+	Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization	2023	Weiran Yao Shelby Heinecke Juan Carlos Niebles Zhiwei Liu Yihao Feng Le Xue Rithesh Murthy Zeyuan Chen Jianguo Zhang Devansh Arpit
+ PDF Chat	Exploring the landscape of large language models: Foundations, techniques, and challenges	2024	Milad Moradi Ke Yan David B. Colwell Matthias Samwald Rhona Asgari
+ PDF Chat	Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering	2024	Joris Postmus Steven Abreu
+ PDF Chat	EVOLvE: Evaluating and Optimizing LLMs For Exploration	2024	Allen Nie Yi Su Bo Chang Jonathan Lee Ed H. Quoc V. Le Minmin Chen
+ PDF Chat	Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models	2024	Maryam Shoaeinaeini Brent Harrison
+ PDF Chat	Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback	2024	Sanjiban Choudhury Paloma Sodhi
+ PDF Chat	Enhancing Q-Learning with Large Language Model Heuristics	2024	Xiefeng Wu
+ PDF Chat	Position: Foundation Agents as the Paradigm Shift for Decision Making	2024	Xiaoqian Liu Xingzhou Lou Jianbin Jiao Junge Zhang
+ PDF Chat	Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control	2024	Yuxin Xiao Chaoqun Wan Yonggang Zhang Wenxiao Wang Binbin Lin Xiaofei He Shen Xu Jieping Ye
+ PDF Chat	Efficient Exploration for LLMs	2024	Vikranth Dwaracherla Seyed Mohammad Asghari Botao Hao Benjamin Van Roy
+ PDF Chat	Empowering Large Language Model Agents through Action Learning	2024	Haiteng Zhao Chang Ma Guoyin Wang Jing Su Lingpeng Kong Jingjing Xu Zhihong Deng Hongxia Yang
+ PDF Chat	Matryoshka: Learning to Drive Black-Box LLMs with LLMs	2024	Changhao Li Yuchen Zhuang Rushi Qiang Haotian Sun H. L. Dai Chao Zhang Bo Dai

Cited by (0)

Action	Title	Year	Authors

Citing (0)

Action	Title	Year	Authors