Evaluation Gaps in Machine Learning Practice

Type: Article

Publication Date: 2022-06-20

Citations: 20

DOI: https://doi.org/10.1145/3531146.3533233

Abstract

Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application ecosystem is critical for its responsible use, and requires considering a broad range of factors including harms, benefits, and responsibilities. In practice, however, evaluations of ML models frequently focus on only a narrow range of decontextualized predictive behaviours. We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations. Through an empirical study of papers from recent high-profile conferences in the Computer Vision and Natural Language Processing communities, we demonstrate a general focus on a handful of evaluation methods. By considering the metrics and test data distributions used in these methods, we draw attention to which properties of models are centered in the field, revealing the properties that are frequently neglected or sidelined during evaluation. By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts; these include commitments to consequentialism, abstractability from context, the quantifiability of impacts, the limited role of model inputs in evaluation, and the equivalence of different failure modes. Shedding light on these assumptions enables us to question their appropriateness for ML system contexts, pointing the way towards more contextualized evaluation methodologies for robustly examining the trustworthiness of ML models.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2022 ACM Conference on Fairness, Accountability, and Transparency - View - PDF

Similar Works

Action Title Year Authors
+ Evaluation Gaps in Machine Learning Practice 2022 Ben Hutchinson
Negar Rostamzadeh
Christina M. Greer
Katherine Heller
Vinodkumar Prabhakaran
+ On the Value of ML Models 2021 Fabio Casati
Pierre‐AndrĂ© NoĂ«l
Jie Yang
+ Frustrated with Replicating Claims of a Shared Model? A Solution 2018 Abdul Dakkak
Cheng Li
Jinjun Xiong
Wen‐mei Hwu
+ Frustrated with replicating claims of a shared model? a solution 2018 Abdul Dakkak
Cheng Li
Jinjun Xiong
Wen‐mei Hwu
+ Rethinking and Recomputing the Value of ML Models 2022 Burcu Sayin
Fabio Casati
Andrea Passerini
Jie Yang
Xinyue Chen
+ Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking 2019 Cheng Li
Abdul Dakkak
Jinjun Xiong
Wen mei Hwu
+ CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability 2021 Martin Mundt
Steven Lang
Quentin Delfosse
Kristian Kersting
+ Interpretable Machine Learning: Moving From Mythos to Diagnostics 2021 Valerie Chen
Jeffrey Li
Joon Sik Kim
Gregory Plumb
Ameet Talwalkar
+ CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability. 2021 Martin Mundt
Steven Lang
Quentin Delfosse
Kristian Kersting
+ PDF Chat Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research 2024 Daniel VranjeĆĄ
Oliver Niggemann
+ Towards Connecting Use Cases and Methods in Interpretable Machine Learning. 2021 Valerie Chen
Jeffrey Li
Joon Sik Kim
Gregory Plumb
Ameet Talwalkar
+ Machine Learning that Matters 2012 Kiri L. Wagstaff
+ Machine Learning that Matters 2012 Kiri L. Wagstaff
+ Pitfalls in Machine Learning Research: Reexamining the Development Cycle 2020 Stella Biderman
Walter J. Scheirer
+ PDF Chat What is it for a Machine Learning Model to Have a Capability? 2024 Jacqueline Harding
Nathaniel Sharadin
+ PDF Chat Model Monitoring in the Absence of Labelled Truth Data via Feature Attributions Distributions 2025 Carlos Mougan
+ Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning 2024 Andrea Apicella
Francesco IsgrĂČ
Roberto Prevete
+ PDF Chat Prescriptive and Descriptive Approaches to Machine-Learning Transparency 2022 David Adkins
Bilal Alsallakh
Adeel Cheema
Narine Kokhlikyan
Emily McReynolds
Pushkar Mishra
Chavez Procope
Jeremy Sawruk
Erin Wang
Polina Zvyagina
+ Towards Clear Expectations for Uncertainty Estimation 2022 Victor Bouvier
Simona Maggio
Alexandre Abraham
LĂ©o Dreyfus-Schmidt
+ MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities 2023 Katherine R. Maffey
Kyle Dotterrer
Jennifer Niemann
Iain J. Cruickshank
Grace A. Lewis
Christian KĂ€stner