Are Language Models Worse than Humans at Following Prompts? It’s Complicated

Albert Webson, Alyssa Marie Loo, Qinan Yu, Ellie Pavlick

Type: Article

Publication Date: 2023-01-01

Citations: 3

DOI: https://doi.org/10.18653/v1/2023.findings-emnlp.514

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Are Language Models Worse than Humans at Following Prompts? It's Complicated	2023	Albert Webson Alyssa Marie Loo Qinan Yu Ellie Pavlick
+ PDF Chat	Do Prompt-Based Models Really Understand the Meaning of Their Prompts?	2022	Albert Webson Ellie Pavlick
+	Do Prompt-Based Models Really Understand the Meaning of their Prompts?	2021	Albert Webson Ellie Pavlick
+ PDF Chat	HREF: Human Response-Guided Evaluation of Instruction Following in Language Models	2024	Xinxi Lyu Yizhong Wang Hannaneh Hajishirzi Pradeep Dasigi
+	Large Language Models Are Human-Level Prompt Engineers	2022	Yongchao Zhou Andrei Ioan Muresanu Ziwen Han Keiran Paster Silviu Pitis Harris Chan Jimmy Ba
+	Calibrate Before Use: Improving Few-Shot Performance of Language Models	2021	Zihao Zhao Eric Wallace Shi Feng Dan Klein Sameer Singh
+ PDF Chat	Too Big to Fool: Resisting Deception in Language Models	2024	Mohammad Reza Samsami M. Richter Juan Rodríguez Megh Thakkar Sarath Chandar Maxime Gasse
+	Calibrate Before Use: Improving Few-Shot Performance of Language Models	2021	Tony Z. Zhao Eric Wallace Shi Feng Dan Klein Sameer Singh
+ PDF Chat	Beyond Performance: Quantifying and Mitigating Label Bias in LLMs	2024	Yuval Reif Roy Schwartz
+ PDF Chat	Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models	2024	Yuan Yu Lili Zhao Kai Zhang Guanjie Zheng Qi Liu
+ PDF Chat	ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs	2024	Jingming Zhuo Songyang Zhang Xinyu Fang Haodong Duan Dahua Lin Kai Chen
+	Do LLMs Overcome Shortcut Learning?An Evaluation of Shortcut Challenges in Large Language Models	2024	Yubo Yuan L. Zhao Kai Zhang Guanjie Zheng Qi Liu
+	Better Zero-Shot Reasoning with Self-Adaptive Prompting	2023	Xingchen Wan Ruoxi Sun Hanjun Dai Sercan Ö. Arık Tomas Pfister
+	Understanding How Model Size Affects Few-shot Instruction Prompting	2022	Ayrton San Joaquin Ardy Haroen
+ PDF Chat	Better Zero-Shot Reasoning with Self-Adaptive Prompting	2023	Xingchen Wan Ruoxi Sun Hanjun Dai Sercan Ö. Arık Tomas Pfister
+	Revisiting Automated Prompting: Are We Actually Doing Better?	2023	Yulin Zhou Yiren Zhao Ilia Shumailov Robert F. Mullins Yarin Gal
+	Revisiting Automated Prompting: Are We Actually Doing Better?	2023	Yulin Zhou Yiren Zhao Ilia Shumailov Robert F. Mullins Yarin Gal
+	Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting	2022	Gabriel Orlanski
+ PDF Chat	Self-Reflection Outcome is Sensitive to Prompt Construction	2024	Fengyuan Liu Nouar AlDahoul Gregory Eady Yasir Zaki Bedoor AlShebli Talal Rahwan
+	Large Language Models are Null-Shot Learners	2024	Pittawat Taveekitworachai Febri Abdullah Ruck Thawonmas

Works That Cite This (3)

Action	Title	Year	Authors
+	Can language models handle recursively nested grammatical structures? A case study on comparing models and humans	2024	Andrew K. Lampinen
+	COMPS: Conceptual Minimal Pair Sentences for testing Robust Property Knowledge and its Inheritance in Pre-trained Language Models	2023	Kanishka Misra Julia Taylor Rayz Allyson Ettinger
+	Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design	2024	Lindia Tjuatja Valerie Chen Tongshuang Wu Ameet Talwalkwar Graham Neubig

Works Cited by This (13)

Action	Title	Year	Authors
+	Character-level Convolutional Networks for Text Classification	2015	Xiang Zhang Junbo Zhao Yann LeCun
+ PDF Chat	XNLI: Evaluating Cross-lingual Sentence Representations	2018	Alexis Conneau Ruty Rinott Guillaume Lample Adina Williams Samuel Bowman Holger Schwenk Veselin Stoyanov
+	A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference	2018	Adina Williams Nikita Nangia Samuel Bowman
+ PDF Chat	BLiMP: The Benchmark of Linguistic Minimal Pairs for English	2020	Alex Warstadt Alicia Parrish Haokun Liu Anhad Mohananey Wei Peng Sheng‐Fu Wang Samuel R. Bowman
+ PDF Chat	Do Prompt-Based Models Really Understand the Meaning of Their Prompts?	2022	Albert Webson Ellie Pavlick
+	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	2022	Jason Lee Xuezhi Wang Dale Schuurmans Maarten Bosma Ed H. Quoc V. Le Denny Zhou
+	Least-to-Most Prompting Enables Complex Reasoning in Large Language Models	2022	Denny Zhou Nathanael Schärli Le Hou Jason Lee Nathan Scales Xuezhi Wang Dale Schuurmans Claire Cui Olivier Bousquet Quoc V. Le
+	Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models	2022	Aarohi Srivastava Abhinav Rastogi Abhishek S. Rao Abu Awal Shoeb Abubakar Abid Adam Fisch Adam R. Brown Adam Santoro Aditya Gupta Adrià Garriga-Alonso
+	Language models show human-like content effects on reasoning tasks	2022	Ishita Dasgupta Andrew K. Lampinen Stephanie C. Y. Chan Antonia Creswell Dharshan Kumaran James L. McClelland Felix Hill
+	Finetuned Language Models Are Zero-Shot Learners	2021	Jason Lee Maarten Bosma Vincent Y. Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai Quoc V. Le