Evaluating Generative AI Systems is a Social Science Measurement Challenge

Type: Preprint

Publication Date: 2024-11-16

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2411.10939

Abstract

Across academia, industry, and government, there is an increasing awareness that the measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult. We argue that these measurement tasks are highly reminiscent of measurement tasks found throughout the social sciences. With this in mind, we present a framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, impacts, opportunities, and risks of GenAI systems. The framework distinguishes between four levels: the background concept, the systematized concept, the measurement instrument(s), and the instance-level measurements themselves. This four-level approach differs from the way measurement is typically done in ML, where researchers and practitioners appear to jump straight from background concepts to measurement instruments, with little to no explicit systematization in between. As well as surfacing assumptions, thereby making it easier to understand exactly what the resulting measurements do and do not mean, this framework has two important implications for evaluating evaluations: First, it can enable stakeholders from different worlds to participate in conceptual debates, broadening the expertise involved in evaluating GenAI systems. Second, it brings rigor to operational debates by offering a set of lenses for interrogating the validity of measurement instruments and their resulting measurements.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat A Shared Standard for Valid Measurement of Generative AI Systems' Capabilities, Risks, and Impacts 2024 Alexandra Chouldechova
Chad Atalla
Solon Barocas
A. Feder Cooper
Emily Corvi
P. Alex Dow
Jean Garcia-Gathright
Nicholas Pangakis
Seann Reed
Emily Sheng
+ PDF Chat A Blueprint for Auditing Generative AI 2024 Jakob Mökander
Justin Curl
Mihir Kshirsagar
+ Sociotechnical Safety Evaluation of Generative AI Systems 2023 Laura Weidinger
Maribeth Rauh
Nahema Marchal
Arianna Manzini
Lisa Anne Hendricks
Juan Mateos-García
Stevie Bergman
Jackie Kay
Conor Griffin
Ben Bariach
+ PDF Chat Dimensions of Generative AI Evaluation Design 2024 P. Alex Dow
Jennifer Wortman Vaughan
Solon Barocas
Chad Atalla
Alexandra Chouldechova
Hanna Wallach
+ PDF Chat Provocation on Expertise in Social Impact Evaluations of Generative AI (and Beyond) 2024 Zoe Kahn
Nitin Kohli
+ Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability 2024 Boming Xia
Qinghua Lu
Liming Zhu
Sung Une Lee
Yue Liu
Zhenchang Xing
+ Evaluating the Social Impact of Generative AI Systems in Systems and Society 2023 Irene Solaiman
Zeerak Talat
William S. Agnew
Lama Ahmad
Dylan Baker
Su Lin Blodgett
Hal Daumé
Jesse Dodge
E. F. Evans
Sara Hooker
+ Assessing AI Impact Assessments: A Classroom Study 2023 Nari Johnson
Hoda Heidari
+ From Principles to Practice: An Accountability Metrics Catalogue for Managing AI Risks 2023 Boming Xia
Qinghua Lu
Liming Zhu
Sung Une Lee
Yue Liu
Zhenchang Xing
+ Generative AI and Its Educational Implications 2024 Kacper Łodzikowski
Peter W. Foltz
John T. Behrens
+ A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications 2023 Ahmed Magooda
Alec Helyar
Kyle Jackson
David Sullivan
Chad Atalla
Emily Sheng
Dan Vann
R. J. Edgar
Hamid Palangi
Roman Lutz
+ PDF Chat Using Scenario-Writing for Identifying and Mitigating Impacts of Generative AI 2024 Kimon Kieslich
Nicholas Diakopoulos
Natali Helberger
+ PDF Chat Provocation: Who benefits from "inclusion" in Generative AI? 2024 Neil F. Johnson
Siobhan Mackenzie Hall
Samantha Dalal
+ Data Equity: Foundational Concepts for Generative AI 2023 JoAnn Stonier
Lauren Woodman
Majed Alshammari
Renée Cummings
Nighat Dad
Arti Garg
Alberto Giovanni Busetto
Katherine Hsiao
Māui Hudson
Parminder Jeet Singh
+ PDF Chat Generative AI and the problem of existential risk 2024 Lynette Webb
Daniel Schönberger
+ PDF Chat Generative AI Toolkit -- a framework for increasing the quality of LLM-based applications over their whole life cycle 2024 Jens Kohl
Luisa Gloger
Rui Ponte Costa
Otto Kruse
Manuel P. Luitz
David L. Katz
Gonzalo Barbeito
Markus Schweier
Ryan French
Josh E. Schroeder
+ Crafting Tomorrow's Evaluations: Assessment Design Strategies in the Era of Generative AI 2024 R. W. Kadel
Bhupesh Kumar Mishra
Samar Shailendra
Samia Abid
Maneeha Rani
Shiva Prasad Mahato
+ Measurement in AI Policy: Opportunities and Challenges 2020 Saurabh Mishra
Jack Clark
C. Raymond Perrault
+ Toward General Design Principles for Generative AI Applications 2023 Justin D. Weisz
Michael Müller
Jessica He
Stephanie Houde
+ Navigating the generative AI era: Introducing the AI assessment scale for ethical GenAI assessment 2023 Mike Perkins
Leon Furze
Jasper Roe
Jason MacVaugh

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors