An Almost Unbiased Method of Obtaining Confidence Intervals for the Probability of Misclassification in Discriminant Analysis

Type: Article

Publication Date: 1967-12-01

Citations: 506

DOI: https://doi.org/10.2307/2528418

Abstract

In a practical classifier design problem, the true population is generally unknown and the available sample is finite-sized.A common approach is to use a resampling technique to estimate the performance of the classifier that will be trained with the available sample.We conducted a Monte Carlo simulation study to compare the ability of the different resampling techniques in training the classifier and predicting its performance under the constraint of a finite-sized sample.The true population for the two classes was assumed to be multivariate normal distributions with known covariance matrices.Finite sets of sample vectors were drawn from the population.The true performance of the classifier is defined as the area under the receiver operating characteristic curve ͑AUC͒ when the classifier designed with the specific sample is applied to the true population.We investigated methods based on the Fukunaga-Hayes and the leave-one-out techniques, as well as three different types of bootstrap methods, namely, the ordinary, 0.632, and 0.632+ bootstrap.The Fisher's linear discriminant analysis was used as the classifier.The dimensionality of the feature space was varied from 3 to 15.The sample size n 2 from the positive class was varied between 25 and 60, while the number of cases from the negative class was either equal to n 2 or 3n 2 .Each experiment was performed with an independent dataset randomly drawn from the true population.Using a total of 1000 experiments for each simulation condition, we compared the bias, the variance, and the root-mean-squared error ͑RMSE͒ of the AUC estimated using the different resampling techniques relative to the true AUC ͑obtained from training on a finite dataset and testing on the population͒.Our results indicated that, under the study conditions, there can be a large difference in the RMSE obtained using different resampling methods, especially when the feature space dimensionality is relatively large and the sample size is small.Under this type of conditions, the 0.632 and 0.632+ bootstrap methods have the lowest RMSE, indicating that the difference between the estimated and the true performances obtained using the 0.632 and 0.632+ bootstrap will be statistically smaller than those obtained using the other three resampling methods.Of the three bootstrap methods, the 0.632+ bootstrap provides the lowest bias.Although this investigation is performed under some specific conditions, it reveals important trends for the problem of classifier performance prediction under the constraint of a limited dataset.

Locations

  • Deep Blue (University of Michigan) - View - PDF
  • PubMed - View
  • Biometrics - View

Similar Works

Action Title Year Authors
+ Estimating the probabilities of misclassification in discriminant analysis 1982 Juan Enrique Ramos
+ Estimating the probabilities of misclassification in discriminant analysis 1982 Juan Enrique Ramos
+ On the estimation of the expected probability of misclassification in discriminant analysis with mixed binary and continuous variables 1986 Ioannis G. Vlachonikolis
+ ON THE ESTIMATION OF THE EXPECTED PROBABILITY OF MISCLASSIFICATION IN DISCRIMINANT ANALYSIS WITH MIXED BINARY AND CONTINUOUS VARIABLES 1986 IOANNIS G. VLACHONIKOLIS
+ Bayesian estimation for misclassification rate in linear discriminant analysis 2021 Koshiro Yonenaga
Akio Suzukawa
+ Errors of misclassification in discriminant analysis. 1994 Pil Soo Park
+ Errors of misclassification in discriminant analysis. 1994 Pil Soo Park
+ Asymptotic estimate of probability of misclassification for discriminant rules based on density estimates 1989 Kamal C. Chanda
F.H. Ruymgaart
+ An evaluation of smoothed error rate estimators in discriminant analysis 1983 Steven Snapinn
+ On the problem of bias in error rate estimation for discriminant analysis 1971 Lars Emil Larsen
D.O. Walter
J.J. McNew
W. R. Adey
+ On inferring the probability of misclassification by the linear discriminant function 1973 S. John
+ Commentary on "Estimation of Error Rates in Discriminant Analysis" 1968 W. G. Cochran
+ Misclassification analysis of discriminant model 2023 Liwen Huang
+ Bounds for the Bayes Error in Classification: A Bayesian Approach Using Discriminant Analysis 2006 T. Pham‐Gia
N. Turkkan
Andriëtte Bekker
+ PDF Chat Bootstrap Methods for Error Rate Estimation in Discriminant Analysis 1992 Sadanori Konishi
Masayuki Honda
+ An Asymptotic Unbiased Technique for Estimating the Error Rates in Discriminant Analysis 1974 Geoffrey J. McLachlan
+ Applications of measures of uncertainty in discriminant analysis 1988 David Hirst
Ian Ford
Frank Critchley
+ The Performance of the Linear Discriminant Function in Nonoptimal Situations and the Estimation of Classification Error Rates: A Review of Recent Findings 1979 William R. Dillon
+ Discriminant Analysis When the Initial Samples Are Misclassified 1966 Peter A. Lachenbruch
+ Estimation of Error Rates in Several-Population Discriminant Analysis 1982 Stephen C. Hora
James B. Wilcox

Works That Cite This (141)

Action Title Year Authors
+ Additive estimators for probabilities of correct classification 1978 Ned Glick
+ PDF Chat Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics 2019 Fred L. Bookstein
+ Ovariectomized rat model and shape variation in the bony labyrinth 2022 Devin L Ward
Lauren Schroeder
Alexander Tinius
Sarah Niccoli
Riley Voth
Simon J. Lees
Mary Silcox
Bence Viola
Paolo Sanzo
+ Three-group classification with unequal misclassification costs: a mathematical programming approach 2001 Constantine Loucopoulos
+ Comparison of various procedures for estimation of the classification error in discriminance analysis 1980 K.‐D. Wernecke
G. Kalb
Ekkehard Stürzebecher
+ An evaluation of smoothed error rate estimators in discriminant analysis 1983 Steven Snapinn
+ On the Determination of Hominid Affinities 1984 G. N. van Vark
+ The Relationship in Terms of Asymptotic Mean Square Error Between the Separate Problems of Estimating each of the Three Types of Error Rate of the Linear Discriminant Function 1974 Geoffrey J. McLachlan
+ PDF Chat Morphometric variability among the species of the Sordida subcomplex (Hemiptera: Reduviidae: Triatominae): evidence for differentiation across the distribution range of Triatoma sordida 2017 Julieta Nattero
Romina V. Piccinali
Catarina Macedo Lopes
María Laura Hernández
Luciana Abrahan
Patricia A. Lobbia
Claudia Rodríguez
Ana Laura Carbajal-de-la-Fuente
+ Discriminant Analysis 1975 Carl J. Huberty