Thinking Beyond Distributions in Testing Machine Learned Models

Negar Rostamzadeh, Ben Hutchinson, Christina M. Greer, Vinodkumar Prabhakaran

Type: Preprint

Publication Date: 2021-12-06

Citations: 0

Abstract

Testing practices within the machine learning (ML) community have centered around assessing a learned model's predictive performance measured against a test dataset, often drawn from the same distribution as the training dataset. While recent work on robustness and fairness testing within the ML community has pointed to the importance of testing against distributional shifts, these efforts also focus on estimating the likelihood of the model making an error against a reference dataset/distribution. We argue that this view of testing actively discourages researchers and developers from looking into other sources of robustness failures, for instance corner cases which may have severe undesirable impacts. We draw parallels with decades of work within software engineering testing focused on assessing a software system against various stress conditions, including corner cases, as opposed to solely focusing on average-case behaviour. Finally, we put forth a set of recommendations to broaden the view of machine learning testing to a rigorous practice.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Thinking Beyond Distributions in Testing Machine Learned Models	2021	Negar Rostamzadeh Ben Hutchinson Christina Greer Vinodkumar Prabhakaran
+ PDF Chat	An empirical study of testing machine learning in the wild	2024	Moses Openja Foutse Khomh Armstrong Foundjem Zhen Ming Jiang Mouna Abidi Ahmed E. Hassan
+	Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild	2023	Vihari Piratla
+	A Review on Oracle Issues in Machine Learning	2021	Diogo Seca
+	A Review on Oracle Issues in Machine Learning.	2021	Diogo Seca
+	A Holistic Assessment of the Reliability of Machine Learning Systems	2023	Anthony Corso David Karamadian Romeo Valentin Mary Reich Cooper Mykel J. Kochenderfer
+	Studying the Practices of Testing Machine Learning Software in the Wild	2023	Moses Openja Foutse Khomh Armstrong Foundjem Zhen Ming Jiang Jiang Mouna Abidi Ahmed E. Hassan
+	Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations	2023	Salah Ghamizi Maxime Cordy Yuejun Guo Mike Papadakis And Yves Le Traon
+	Run, Forest, Run? On Randomization and Reproducibility in Predictive Software Engineering	2020	Cynthia C. S. Liem Annibale Panichella
+ PDF Chat	Machine Learning Robustness: A Primer	2024	Houssem Ben Braiek Foutse Khomh
+	Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects.	2020	Steffen Herbold Tobias von der Haar
+ PDF Chat	Smoke testing for machine learning: simple tests to discover severe bugs	2022	Steffen Herbold Tobias von der Haar
+	Frustrated with replicating claims of a shared model? a solution	2018	Abdul Dakkak Cheng Li Jinjun Xiong Wen‐mei Hwu
+	Tracking the risk of a deployed model and detecting harmful distribution shifts	2021	Aleksandr Podkopaev Aaditya Ramdas
+	Tracking the risk of a deployed model and detecting harmful distribution shifts.	2021	Aleksandr Podkopaev Aaditya Ramdas
+	Frustrated with Replicating Claims of a Shared Model? A Solution	2018	Abdul Dakkak Cheng Li Jinjun Xiong Wen‐mei Hwu
+ PDF Chat	Monitoring Model Deterioration with Explainable Uncertainty Estimation via Non-parametric Bootstrap	2023	Carlos Mougan Dan Saattrup Nielsen
+	Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift	2019	Stephan Rabanser Stephan Günnemann Zachary C. Lipton
+	Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift	2018	Stephan Rabanser Stephan Günnemann Zachary C. Lipton
+	Test & Evaluation Best Practices for Machine Learning-Enabled Systems	2023	Jaganmohan Chandrasekaran Tyler Cody Nicola McCarthy Erin Lanus Laura Freeman

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors