An Almost Unbiased Method of Obtaining Confidence Intervals for the Probability of Misclassification in Discriminant Analysis

Peter A. Lachenbruch

Type: Article

Publication Date: 1967-12-01

Citations: 506

DOI: https://doi.org/10.2307/2528418

Abstract

In a practical classifier design problem, the true population is generally unknown and the available sample is finite-sized.A common approach is to use a resampling technique to estimate the performance of the classifier that will be trained with the available sample.We conducted a Monte Carlo simulation study to compare the ability of the different resampling techniques in training the classifier and predicting its performance under the constraint of a finite-sized sample.The true population for the two classes was assumed to be multivariate normal distributions with known covariance matrices.Finite sets of sample vectors were drawn from the population.The true performance of the classifier is defined as the area under the receiver operating characteristic curve ͑AUC͒ when the classifier designed with the specific sample is applied to the true population.We investigated methods based on the Fukunaga-Hayes and the leave-one-out techniques, as well as three different types of bootstrap methods, namely, the ordinary, 0.632, and 0.632+ bootstrap.The Fisher's linear discriminant analysis was used as the classifier.The dimensionality of the feature space was varied from 3 to 15.The sample size n 2 from the positive class was varied between 25 and 60, while the number of cases from the negative class was either equal to n 2 or 3n 2 .Each experiment was performed with an independent dataset randomly drawn from the true population.Using a total of 1000 experiments for each simulation condition, we compared the bias, the variance, and the root-mean-squared error ͑RMSE͒ of the AUC estimated using the different resampling techniques relative to the true AUC ͑obtained from training on a finite dataset and testing on the population͒.Our results indicated that, under the study conditions, there can be a large difference in the RMSE obtained using different resampling methods, especially when the feature space dimensionality is relatively large and the sample size is small.Under this type of conditions, the 0.632 and 0.632+ bootstrap methods have the lowest RMSE, indicating that the difference between the estimated and the true performances obtained using the 0.632 and 0.632+ bootstrap will be statistically smaller than those obtained using the other three resampling methods.Of the three bootstrap methods, the 0.632+ bootstrap provides the lowest bias.Although this investigation is performed under some specific conditions, it reveals important trends for the problem of classifier performance prediction under the constraint of a limited dataset.

Locations

Deep Blue (University of Michigan) - View - PDF
PubMed - View
Biometrics - View

Similar Works

Action	Title	Year	Authors
+	Estimating the probabilities of misclassification in discriminant analysis	1982	Juan Enrique Ramos
+	Estimating the probabilities of misclassification in discriminant analysis	1982	Juan Enrique Ramos
+	On the estimation of the expected probability of misclassification in discriminant analysis with mixed binary and continuous variables	1986	Ioannis G. Vlachonikolis
+	ON THE ESTIMATION OF THE EXPECTED PROBABILITY OF MISCLASSIFICATION IN DISCRIMINANT ANALYSIS WITH MIXED BINARY AND CONTINUOUS VARIABLES	1986	IOANNIS G. VLACHONIKOLIS
+	Bayesian estimation for misclassification rate in linear discriminant analysis	2021	Koshiro Yonenaga Akio Suzukawa
+	Errors of misclassification in discriminant analysis.	1994	Pil Soo Park
+	Errors of misclassification in discriminant analysis.	1994	Pil Soo Park
+	Asymptotic estimate of probability of misclassification for discriminant rules based on density estimates	1989	Kamal C. Chanda F.H. Ruymgaart
+	An evaluation of smoothed error rate estimators in discriminant analysis	1983	Steven Snapinn
+	On the problem of bias in error rate estimation for discriminant analysis	1971	Lars Emil Larsen D.O. Walter J.J. McNew W. R. Adey
+	On inferring the probability of misclassification by the linear discriminant function	1973	S. John
+	Commentary on "Estimation of Error Rates in Discriminant Analysis"	1968	W. G. Cochran
+	Misclassification analysis of discriminant model	2023	Liwen Huang
+	Bounds for the Bayes Error in Classification: A Bayesian Approach Using Discriminant Analysis	2006	T. Pham‐Gia N. Turkkan Andriëtte Bekker
+ PDF Chat	Bootstrap Methods for Error Rate Estimation in Discriminant Analysis	1992	Sadanori Konishi Masayuki Honda
+	An Asymptotic Unbiased Technique for Estimating the Error Rates in Discriminant Analysis	1974	Geoffrey J. McLachlan
+	Applications of measures of uncertainty in discriminant analysis	1988	David Hirst Ian Ford Frank Critchley
+	The Performance of the Linear Discriminant Function in Nonoptimal Situations and the Estimation of Classification Error Rates: A Review of Recent Findings	1979	William R. Dillon
+	Discriminant Analysis When the Initial Samples Are Misclassified	1966	Peter A. Lachenbruch
+	Estimation of Error Rates in Several-Population Discriminant Analysis	1982	Stephen C. Hora James B. Wilcox

Works That Cite This (141)

Action	Title	Year	Authors
+	Additive estimators for probabilities of correct classification	1978	Ned Glick
+ PDF Chat	Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics	2019	Fred L. Bookstein
+	Ovariectomized rat model and shape variation in the bony labyrinth	2022	Devin L Ward Lauren Schroeder Alexander Tinius Sarah Niccoli Riley Voth Simon J. Lees Mary Silcox Bence Viola Paolo Sanzo
+	Three-group classification with unequal misclassification costs: a mathematical programming approach	2001	Constantine Loucopoulos
+	Comparison of various procedures for estimation of the classification error in discriminance analysis	1980	K.‐D. Wernecke G. Kalb Ekkehard Stürzebecher
+	An evaluation of smoothed error rate estimators in discriminant analysis	1983	Steven Snapinn
+	On the Determination of Hominid Affinities	1984	G. N. van Vark
+	The Relationship in Terms of Asymptotic Mean Square Error Between the Separate Problems of Estimating each of the Three Types of Error Rate of the Linear Discriminant Function	1974	Geoffrey J. McLachlan
+ PDF Chat	Morphometric variability among the species of the Sordida subcomplex (Hemiptera: Reduviidae: Triatominae): evidence for differentiation across the distribution range of Triatoma sordida	2017	Julieta Nattero Romina V. Piccinali Catarina Macedo Lopes María Laura Hernández Luciana Abrahan Patricia A. Lobbia Claudia Rodríguez Ana Laura Carbajal-de-la-Fuente
+	Discriminant Analysis	1975	Carl J. Huberty

Works Cited by This (12)

Action	Title	Year	Authors
+	On the generalized distance in statistics	1936	P. C. Mahalanobis
+	Comparison of Non-Parametric Methods for Assessing Classifier Performance in Terms of ROC Parameters	2005	Waleed A. Yousef Robert F. Wagner Murray H. Loew
+	Bibliography on estimation of misclassification	1974	Godfried Toussaint
+	Estimating the uncertainty in the estimated mean area under the ROC curve of a classifier	2005	Waleed A. Yousef Robert F. Wagner Murray H. Loew
+	Recent advances in error rate estimation	1986	David J. Hand
+ PDF Chat	A Solution to the Problem of Optimum Classification	1949	Paul G. Hoel Raymond P. Peterson
+	Effects of sample size in classifier design	1989	Keinosuke Fukunaga R.R. Hayes
+ PDF Chat	An Almost Unbiased Method of Obtaining Confidence Intervals for the Probability of Misclassification in Discriminant Analysis	1967	Peter A. Lachenbruch
+	Estimation of Error Rates in Discriminant Analysis	1968	Peter A. Lachenbruch M. R. Mickey
+ PDF Chat	Introduction to the Bootstrap World	2003	Dennis D. Boos