Thinking Beyond Distributions in Testing Machine Learned Models

Type: Preprint

Publication Date: 2021-12-06

Citations: 0

Abstract

Testing practices within the machine learning (ML) community have centered around assessing a learned model's predictive performance measured against a test dataset, often drawn from the same distribution as the training dataset. While recent work on robustness and fairness testing within the ML community has pointed to the importance of testing against distributional shifts, these efforts also focus on estimating the likelihood of the model making an error against a reference dataset/distribution. We argue that this view of testing actively discourages researchers and developers from looking into other sources of robustness failures, for instance corner cases which may have severe undesirable impacts. We draw parallels with decades of work within software engineering testing focused on assessing a software system against various stress conditions, including corner cases, as opposed to solely focusing on average-case behaviour. Finally, we put forth a set of recommendations to broaden the view of machine learning testing to a rigorous practice.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Thinking Beyond Distributions in Testing Machine Learned Models 2021 Negar Rostamzadeh
Ben Hutchinson
Christina Greer
Vinodkumar Prabhakaran
+ PDF Chat An empirical study of testing machine learning in the wild 2024 Moses Openja
Foutse Khomh
Armstrong Foundjem
Zhen Ming Jiang
Mouna Abidi
Ahmed E. Hassan
+ Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild 2023 Vihari Piratla
+ A Review on Oracle Issues in Machine Learning 2021 Diogo Seca
+ A Review on Oracle Issues in Machine Learning. 2021 Diogo Seca
+ A Holistic Assessment of the Reliability of Machine Learning Systems 2023 Anthony Corso
David Karamadian
Romeo Valentin
Mary Reich Cooper
Mykel J. Kochenderfer
+ Studying the Practices of Testing Machine Learning Software in the Wild 2023 Moses Openja
Foutse Khomh
Armstrong Foundjem
Zhen Ming
Jiang Jiang
Mouna Abidi
Ahmed E. Hassan
+ Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations 2023 Salah Ghamizi
Maxime Cordy
Yuejun Guo
Mike Papadakis
And Yves Le Traon
+ Run, Forest, Run? On Randomization and Reproducibility in Predictive Software Engineering 2020 Cynthia C. S. Liem
Annibale Panichella
+ PDF Chat Machine Learning Robustness: A Primer 2024 Houssem Ben Braiek
Foutse Khomh
+ Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects. 2020 Steffen Herbold
Tobias von der Haar
+ PDF Chat Smoke testing for machine learning: simple tests to discover severe bugs 2022 Steffen Herbold
Tobias von der Haar
+ Frustrated with replicating claims of a shared model? a solution 2018 Abdul Dakkak
Cheng Li
Jinjun Xiong
Wen‐mei Hwu
+ Tracking the risk of a deployed model and detecting harmful distribution shifts 2021 Aleksandr Podkopaev
Aaditya Ramdas
+ Tracking the risk of a deployed model and detecting harmful distribution shifts. 2021 Aleksandr Podkopaev
Aaditya Ramdas
+ Frustrated with Replicating Claims of a Shared Model? A Solution 2018 Abdul Dakkak
Cheng Li
Jinjun Xiong
Wen‐mei Hwu
+ PDF Chat Monitoring Model Deterioration with Explainable Uncertainty Estimation via Non-parametric Bootstrap 2023 Carlos Mougan
Dan Saattrup Nielsen
+ Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift 2019 Stephan Rabanser
Stephan GĂźnnemann
Zachary C. Lipton
+ Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift 2018 Stephan Rabanser
Stephan GĂźnnemann
Zachary C. Lipton
+ Test & Evaluation Best Practices for Machine Learning-Enabled Systems 2023 Jaganmohan Chandrasekaran
Tyler Cody
Nicola McCarthy
Erin Lanus
Laura Freeman

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors