Controlling Large Language Model Agents with Entropic Activation Steering

Type: Preprint

Publication Date: 2024-05-31

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2406.00244

View Chat PDF

Abstract

The generality of pretrained large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. To be successful, such agents must form beliefs about how to achieve their goals based on limited interaction with their environment, resulting in uncertainty about the best action to take at each step. In this paper, we study how LLM agents form and act on these beliefs by conducting experiments in controlled sequential decision-making tasks. To begin, we find that LLM agents are overconfident: They draw strong conclusions about what to do based on insufficient evidence, resulting in inadequately explorative behavior. We dig deeper into this phenomenon and show how it emerges from a collapse in the entropy of the action distribution implied by sampling from the LLM. We then demonstrate that existing token-level sampling techniques are by themselves insufficient to make the agent explore more. Motivated by this fact, we introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. EAST computes a steering vector as an entropy-weighted combination of representations, and uses it to manipulate an LLM agent's uncertainty over actions by intervening on its activations during the forward pass. We show that EAST can reliably increase the entropy in an LLM agent's actions, causing more explorative behavior to emerge. Finally, EAST modifies the subjective uncertainty an LLM agent expresses, paving the way to interpreting and controlling how LLM agents represent uncertainty about their decisions.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Can large language models explore in-context? 2024 Akshay Krishnamurthy
Keegan Harris
Dylan J. Foster
Cyril Zhang
Aleksandrs Slivkins
+ PDF Chat Entropy-Regularized Token-Level Policy Optimization for Large Language Models 2024 Muning Wen
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
+ PDF Chat Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning 2024 Yao-Hung Hubert Tsai
Walter Talbott
Jian Zhang
+ Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models 2023 Wanqiao Xu
Shi Dong
Dilip Arumugam
Benjamin Van Roy
+ PDF Chat Soft Self-Consistency Improves Language Model Agents 2024 Han Wang
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
+ PDF Chat Multi-property Steering of Large Language Models with Dynamic Activation Composition 2024 Daniel Scalena
Gabriele Sarti
Malvina Nissim
+ PDF Chat Reinforcing Language Agents via Policy Optimization with Action Decomposition 2024 Muning Wen
Ziyu Wan
Weinan Zhang
Jun Wang
Ying Wen
+ PDF Chat Disentangling Exploration of Large Language Models by Optimal Exploitation 2025 Tim Grams
Patrick Betz
Christian Bartelt
+ Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization 2023 Weiran Yao
Shelby Heinecke
Juan Carlos Niebles
Zhiwei Liu
Yihao Feng
Le Xue
Rithesh Murthy
Zeyuan Chen
Jianguo Zhang
Devansh Arpit
+ PDF Chat Exploring the landscape of large language models: Foundations, techniques, and challenges 2024 Milad Moradi
Ke Yan
David B. Colwell
Matthias Samwald
Rhona Asgari
+ PDF Chat Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering 2024 Joris Postmus
Steven Abreu
+ PDF Chat EVOLvE: Evaluating and Optimizing LLMs For Exploration 2024 Allen Nie
Yi Su
Bo Chang
Jonathan Lee
Ed H.
Quoc V. Le
Minmin Chen
+ PDF Chat Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models 2024 Maryam Shoaeinaeini
Brent Harrison
+ PDF Chat Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback 2024 Sanjiban Choudhury
Paloma Sodhi
+ PDF Chat Enhancing Q-Learning with Large Language Model Heuristics 2024 Xiefeng Wu
+ PDF Chat Position: Foundation Agents as the Paradigm Shift for Decision Making 2024 Xiaoqian Liu
Xingzhou Lou
Jianbin Jiao
Junge Zhang
+ PDF Chat Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control 2024 Yuxin Xiao
Chaoqun Wan
Yonggang Zhang
Wenxiao Wang
Binbin Lin
Xiaofei He
Shen Xu
Jieping Ye
+ PDF Chat Efficient Exploration for LLMs 2024 Vikranth Dwaracherla
Seyed Mohammad Asghari
Botao Hao
Benjamin Van Roy
+ PDF Chat Empowering Large Language Model Agents through Action Learning 2024 Haiteng Zhao
Chang Ma
Guoyin Wang
Jing Su
Lingpeng Kong
Jingjing Xu
Zhihong Deng
Hongxia Yang
+ PDF Chat Matryoshka: Learning to Drive Black-Box LLMs with LLMs 2024 Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. L. Dai
Chao Zhang
Bo Dai

Cited by (0)

Action Title Year Authors

Citing (0)

Action Title Year Authors