Author Description

Login to generate an author description

Ask a Question About This Mathematician

By modifying the statistic of Malkovich and Afifi (1973 Malkovich , J. F. , Afifi , A. A. ( 1973 ). On tests for multivariate normality . J. Amer. Statist. … By modifying the statistic of Malkovich and Afifi (1973 Malkovich , J. F. , Afifi , A. A. ( 1973 ). On tests for multivariate normality . J. Amer. Statist. Assoc. 68 : 176 – 179 .[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]), we introduce and study the properties of a notion of multivariate skewness that provides both a magnitude and an overall direction for the skewness present in multivariate data. This notion leads to a test statistic for the nonparametric null hypothesis of multivariate symmetry. Under mild assumptions, we find the asymptotic distribution of the test statistic and evaluate, by simulation, the convergence of the finite sample size percentiles to their limits. We also present an associated test statistic for multivariate normality.
Context. Optical and infrared variability surveys produce a large number of high quality light curves. Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively … Context. Optical and infrared variability surveys produce a large number of high quality light curves. Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively low computational cost. In order to perform supervised classification, a set of features is proposed and used to train an automatic classification system. Quantities related to the magnitude density of the light curves and their Fourier coefficients have been chosen as features in previous studies. However, some of these features are not robust to the presence of outliers and the calculation of Fourier coefficients is computationally expensive for large data sets.
For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ … For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ and $X_j-p$, for data points $X_i$, $X_j$, near $p$, and evaluate this statistic as a tool for estimation of the intrinsic dimension of $M$ at $p$. Consistency of the local dimension estimator is established and the asymptotic distribution of $U_{k,n}$ is found under minimal regularity assumptions. Performance of the proposed methodology is compared against state-of-the-art methods on simulated data.
We propose two kind of permutation tests for the two sample problem for functional data. One is based on nearest neighbours and the other based on functional depths. We propose two kind of permutation tests for the two sample problem for functional data. One is based on nearest neighbours and the other based on functional depths.
Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration … Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration of new ones. The areas of statistical application of graph theoretic procedures include the two‐sample problem, clustering, outlier identification, hypothesis testing, and dimension identification.
ABSTRACT We discuss the advantages of using estimators based on large order statistics of the runs of 0's and 1's in the estimation of the success probability associated with a … ABSTRACT We discuss the advantages of using estimators based on large order statistics of the runs of 0's and 1's in the estimation of the success probability associated with a sequence of independent Bernoulli trials, when this probability might be changing. Through theoretical arguments as well as Monte Carlo simulations, we show that appropriate linear combinations of these statistics offer the ability of following, relatively rapidly, the underlying probability when it is changing monotonically. In order to define our estimator, we introduce a coefficient that can be used in testing the null hypothesis that the underlying success probability has remained constant throughout the sequence of independent Bernoulli trials.
We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a … We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a univariate parametric family. A Monte Carlo comparison is carried out to compare the method proposed with the classical Wilcoxon and Ansari-Bradley statistics and the Kolmogorov-Smirnov and Cramer-von Mises statistics the two-sample problem, showing that, for certain relevant alternatives, the proposed method offers advantages, in terms of power, over its classical counterparts. Theoretically, the consistency of the statistic proposed is studied and a Central Limit Theorem is established for its distribution.
Abstract We discuss the application of linear combinations of the degree frequencies in the minimal spanning tree to the problem of identifying the appropriate dimension for a data set from … Abstract We discuss the application of linear combinations of the degree frequencies in the minimal spanning tree to the problem of identifying the appropriate dimension for a data set from its interpoint distance matrix. This graph-theoretical methodology, of very low computational cost, can be of aid in the problem of Multidimensional Scaling and in dimensionality reduction. Results of Lee [Lee, S. (1999). The central limit theorem for euclidean minimal spanning trees II. Adv. Appl. Probability 31(4): 969–984] imply that the procedure proposed here is asymptotically consistent.
Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively low computational cost. In order to perform supervised classification, a set of features is proposed … Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively low computational cost. In order to perform supervised classification, a set of features is proposed and used to train an automatic classification system. Quantities related to the magnitude density of the light curves and their Fourier coefficients have been chosen as features in previous studies. However, some of these features are not robust to the presence of outliers and the calculation of Fourier coefficients is computationally expensive for large data sets. We propose and evaluate the performance of a new robust set of features using supervised classifiers in order to look for new Be star candidates in the OGLE-IV Gaia south ecliptic pole field. We calculated the proposed set of features on six types of variable stars and on a set of Be star candidates reported in the literature. We evaluated the performance of these features using classification trees and random forests along with K-nearest neighbours, support vector machines, and gradient boosted trees methods. We tuned the classifiers with a 10-fold cross-validation and grid search. We validated the performance of the best classifier on a set of OGLE-IV light curves and applied this to find new Be star candidates. The random forest classifier outperformed the others. By using the random forest classifier and colour criteria we found 50 Be star candidates in the direction of the Gaia south ecliptic pole field, four of which have infrared colours consistent with Herbig Ae/Be stars. Supervised methods are very useful in order to obtain preliminary samples of variable stars extracted from large databases. As usual, the stars classified as Be stars candidates must be checked for the colours and spectroscopic characteristics expected for them.
High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte … High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte Carlo methods to improve risk adjustment by clustering diagnostic codes into risk groups optimal for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from 500 thousand enrollees of the Colombian Healthcare System. Results show that our methodology outperforms common alternatives and suggest that it has potential to improve access to quality healthcare for the chronically ill.
In the multivariate statistics community, it is commonly acknowledged that among the hierarchical clustering tree (HCT) procedures, the single linkage rule for inter-cluster distance, tends to produce trees which are … In the multivariate statistics community, it is commonly acknowledged that among the hierarchical clustering tree (HCT) procedures, the single linkage rule for inter-cluster distance, tends to produce trees which are significantly more asymmetric than those obtained using other rules such as complete linkage, for instance. We consider the use of Shannon's entropy of the partitions determined by HCTs as a measure of the asymmetry of the clustering trees. On a different direction, our simulations show an unexpected relationship between Shannon's entropy of partitions and dimension of the data. Based on this observation a procedure for intrinsic dimension identification based on entropy of partitions is proposed and studied. A theoretical result is established for the dimension identification method stating that, locally, for continuous data on a d-dimensional manifold, the entropy of partitions behaves as if the local data were uniformly sampled from the unit ball of Rd. Evaluation on simulated examples shows that the method proposed compares favorably with other procedures for dimension identification available in the literature.
It is shown how tools from the area of Model Theory, specifically from the Theory of o-minimality, can be used to prove that a class of functions is VC-subgraph (in … It is shown how tools from the area of Model Theory, specifically from the Theory of o-minimality, can be used to prove that a class of functions is VC-subgraph (in the sense of Dudley, 1987), and therefore satisfies a uniform polynomial metric entropy bound. We give examples where the use of these methods significantly improves the existing metric entropy bounds. The methods proposed here can be applied to finite dimensional parametric families of functions without the need for the parameters to live in a compact set, as is sometimes required in theorems that produce similar entropy bounds (for instance Theorem 19.7 of van der Vaart, 1998).
We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a … We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a univariate parametric family. A Monte Carlo comparison is carried out to compare the method proposed with the classical Wilcoxon and Ansari-Bradley statistics and the Kolmogorov-Smirnov and Cramer-von Mises statistics the two-sample problem, showing that, for certain relevant alternatives, the proposed method oers advantages, in terms of power, over its classical counterparts. Theoretically, the consistency of the statistic proposed is studied and a Central Limit Theorem is established for its distribution.
The use of quadratic forms of the empirical process for the two-sample problem in the context of functional data is considered. The convergence of the family of statistics proposed to … The use of quadratic forms of the empirical process for the two-sample problem in the context of functional data is considered. The convergence of the family of statistics proposed to a Gaussian limit is established under metric entropy conditions for smooth functional data. The applicability of the proposed methodology is evaluated in examples.
An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest … An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest neighbors ideas presented in Camelo at al. (2015). As in that reference, the goal of the present proposal is to achieve a faster solution of the SVM problem without a significance loss in the prediction error. The performance of the methodology is evaluated in benchmark examples and theoretical aspects of subsample methods are discussed.
Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration … Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration of new ones. The areas of statistical application of graph theoretic procedures include the two‐sample problem, clustering, outlier identification, hypothesis testing, and dimension identification.
Three different permutation test schemes are discussed and compared in the context of the two-sample problem for functional data. One of the procedures was essentially introduced by Lopez-Pintado and Romo … Three different permutation test schemes are discussed and compared in the context of the two-sample problem for functional data. One of the procedures was essentially introduced by Lopez-Pintado and Romo (2009), using notions of functional data depth to adapt the ideas originally proposed by Liu and Singh (1993) for multivariate data. Of the new methods introduced here, one is also based on functional data depths, but uses a different way (inspired by Meta-Analysis) to assess the significance of the depth differences. The second new method presented here adapts, to the functional data setting, the k-nearest-neighbors statistic of Schilling (1986). The three methods are compared among them and against the test of Horvath and Kokoszka (2012) in simulated examples and real data. The comparison considers the performance of the statistics in terms of statistical power and in terms of computational cost.
For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ … For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ and $X_j-p$, for data points $X_i$, $X_j$, near $p$, and evaluate this statistic as a tool for estimation of the intrinsic dimension of $M$ at $p$. Consistency of the local dimension estimator is established and the asymptotic distribution of $U_{k,n}$ is found under minimal regularity assumptions. Performance of the proposed methodology is compared against state-of-the-art methods on simulated data.
In the multivariate statistics community, it is commonly acknowledged that among the hierarchical clustering tree (HCT) procedures, the single linkage rule for inter-cluster distance, tends to produce trees which are … In the multivariate statistics community, it is commonly acknowledged that among the hierarchical clustering tree (HCT) procedures, the single linkage rule for inter-cluster distance, tends to produce trees which are significantly more asymmetric than those obtained using other rules such as complete linkage, for instance. We consider the use of Shannon's entropy of the partitions determined by HCTs as a measure of the asymmetry of the clustering trees. On a different direction, our simulations show an unexpected relationship between Shannon's entropy of partitions and dimension of the data. Based on this observation a procedure for intrinsic dimension identification based on entropy of partitions is proposed and studied. A theoretical result is established for the dimension identification method stating that, locally, for continuous data on a d-dimensional manifold, the entropy of partitions behaves as if the local data were uniformly sampled from the unit ball of Rd. Evaluation on simulated examples shows that the method proposed compares favorably with other procedures for dimension identification available in the literature.
For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ … For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ and $X_j-p$, for data points $X_i$, $X_j$, near $p$, and evaluate this statistic as a tool for estimation of the intrinsic dimension of $M$ at $p$. Consistency of the local dimension estimator is established and the asymptotic distribution of $U_{k,n}$ is found under minimal regularity assumptions. Performance of the proposed methodology is compared against state-of-the-art methods on simulated data.
An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest … An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest neighbors ideas presented in Camelo at al. (2015). As in that reference, the goal of the present proposal is to achieve a faster solution of the SVM problem without a significance loss in the prediction error. The performance of the methodology is evaluated in benchmark examples and theoretical aspects of subsample methods are discussed.
High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte … High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte Carlo methods to improve risk adjustment by clustering diagnostic codes into risk groups optimal for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from 500 thousand enrollees of the Colombian Healthcare System. Results show that our methodology outperforms common alternatives and suggest that it has potential to improve access to quality healthcare for the chronically ill.
For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ … For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ and $X_j-p$, for data points $X_i$, $X_j$, near $p$, and evaluate this statistic as a tool for estimation of the intrinsic dimension of $M$ at $p$. Consistency of the local dimension estimator is established and the asymptotic distribution of $U_{k,n}$ is found under minimal regularity assumptions. Performance of the proposed methodology is compared against state-of-the-art methods on simulated data.
Context. Optical and infrared variability surveys produce a large number of high quality light curves. Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively … Context. Optical and infrared variability surveys produce a large number of high quality light curves. Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively low computational cost. In order to perform supervised classification, a set of features is proposed and used to train an automatic classification system. Quantities related to the magnitude density of the light curves and their Fourier coefficients have been chosen as features in previous studies. However, some of these features are not robust to the presence of outliers and the calculation of Fourier coefficients is computationally expensive for large data sets.
We propose two kind of permutation tests for the two sample problem for functional data. One is based on nearest neighbours and the other based on functional depths. We propose two kind of permutation tests for the two sample problem for functional data. One is based on nearest neighbours and the other based on functional depths.
Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively low computational cost. In order to perform supervised classification, a set of features is proposed … Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively low computational cost. In order to perform supervised classification, a set of features is proposed and used to train an automatic classification system. Quantities related to the magnitude density of the light curves and their Fourier coefficients have been chosen as features in previous studies. However, some of these features are not robust to the presence of outliers and the calculation of Fourier coefficients is computationally expensive for large data sets. We propose and evaluate the performance of a new robust set of features using supervised classifiers in order to look for new Be star candidates in the OGLE-IV Gaia south ecliptic pole field. We calculated the proposed set of features on six types of variable stars and on a set of Be star candidates reported in the literature. We evaluated the performance of these features using classification trees and random forests along with K-nearest neighbours, support vector machines, and gradient boosted trees methods. We tuned the classifiers with a 10-fold cross-validation and grid search. We validated the performance of the best classifier on a set of OGLE-IV light curves and applied this to find new Be star candidates. The random forest classifier outperformed the others. By using the random forest classifier and colour criteria we found 50 Be star candidates in the direction of the Gaia south ecliptic pole field, four of which have infrared colours consistent with Herbig Ae/Be stars. Supervised methods are very useful in order to obtain preliminary samples of variable stars extracted from large databases. As usual, the stars classified as Be stars candidates must be checked for the colours and spectroscopic characteristics expected for them.
Three different permutation test schemes are discussed and compared in the context of the two-sample problem for functional data. One of the procedures was essentially introduced by Lopez-Pintado and Romo … Three different permutation test schemes are discussed and compared in the context of the two-sample problem for functional data. One of the procedures was essentially introduced by Lopez-Pintado and Romo (2009), using notions of functional data depth to adapt the ideas originally proposed by Liu and Singh (1993) for multivariate data. Of the new methods introduced here, one is also based on functional data depths, but uses a different way (inspired by Meta-Analysis) to assess the significance of the depth differences. The second new method presented here adapts, to the functional data setting, the k-nearest-neighbors statistic of Schilling (1986). The three methods are compared among them and against the test of Horvath and Kokoszka (2012) in simulated examples and real data. The comparison considers the performance of the statistics in terms of statistical power and in terms of computational cost.
It is shown how tools from the area of Model Theory, specifically from the Theory of o-minimality, can be used to prove that a class of functions is VC-subgraph (in … It is shown how tools from the area of Model Theory, specifically from the Theory of o-minimality, can be used to prove that a class of functions is VC-subgraph (in the sense of Dudley, 1987), and therefore satisfies a uniform polynomial metric entropy bound. We give examples where the use of these methods significantly improves the existing metric entropy bounds. The methods proposed here can be applied to finite dimensional parametric families of functions without the need for the parameters to live in a compact set, as is sometimes required in theorems that produce similar entropy bounds (for instance Theorem 19.7 of van der Vaart, 1998).
The use of quadratic forms of the empirical process for the two-sample problem in the context of functional data is considered. The convergence of the family of statistics proposed to … The use of quadratic forms of the empirical process for the two-sample problem in the context of functional data is considered. The convergence of the family of statistics proposed to a Gaussian limit is established under metric entropy conditions for smooth functional data. The applicability of the proposed methodology is evaluated in examples.
Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration … Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration of new ones. The areas of statistical application of graph theoretic procedures include the two‐sample problem, clustering, outlier identification, hypothesis testing, and dimension identification.
We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a … We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a univariate parametric family. A Monte Carlo comparison is carried out to compare the method proposed with the classical Wilcoxon and Ansari-Bradley statistics and the Kolmogorov-Smirnov and Cramer-von Mises statistics the two-sample problem, showing that, for certain relevant alternatives, the proposed method offers advantages, in terms of power, over its classical counterparts. Theoretically, the consistency of the statistic proposed is studied and a Central Limit Theorem is established for its distribution.
We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a … We present a non-parametric statistic based on a linearity measure of the P-P plot for the two-sample problem by adapting a known statistic proposed for goodness of fit to a univariate parametric family. A Monte Carlo comparison is carried out to compare the method proposed with the classical Wilcoxon and Ansari-Bradley statistics and the Kolmogorov-Smirnov and Cramer-von Mises statistics the two-sample problem, showing that, for certain relevant alternatives, the proposed method oers advantages, in terms of power, over its classical counterparts. Theoretically, the consistency of the statistic proposed is studied and a Central Limit Theorem is established for its distribution.
By modifying the statistic of Malkovich and Afifi (1973 Malkovich , J. F. , Afifi , A. A. ( 1973 ). On tests for multivariate normality . J. Amer. Statist. … By modifying the statistic of Malkovich and Afifi (1973 Malkovich , J. F. , Afifi , A. A. ( 1973 ). On tests for multivariate normality . J. Amer. Statist. Assoc. 68 : 176 – 179 .[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]), we introduce and study the properties of a notion of multivariate skewness that provides both a magnitude and an overall direction for the skewness present in multivariate data. This notion leads to a test statistic for the nonparametric null hypothesis of multivariate symmetry. Under mild assumptions, we find the asymptotic distribution of the test statistic and evaluate, by simulation, the convergence of the finite sample size percentiles to their limits. We also present an associated test statistic for multivariate normality.
Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration … Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration of new ones. The areas of statistical application of graph theoretic procedures include the two‐sample problem, clustering, outlier identification, hypothesis testing, and dimension identification.
Abstract We discuss the application of linear combinations of the degree frequencies in the minimal spanning tree to the problem of identifying the appropriate dimension for a data set from … Abstract We discuss the application of linear combinations of the degree frequencies in the minimal spanning tree to the problem of identifying the appropriate dimension for a data set from its interpoint distance matrix. This graph-theoretical methodology, of very low computational cost, can be of aid in the problem of Multidimensional Scaling and in dimensionality reduction. Results of Lee [Lee, S. (1999). The central limit theorem for euclidean minimal spanning trees II. Adv. Appl. Probability 31(4): 969–984] imply that the procedure proposed here is asymptotically consistent.
ABSTRACT We discuss the advantages of using estimators based on large order statistics of the runs of 0's and 1's in the estimation of the success probability associated with a … ABSTRACT We discuss the advantages of using estimators based on large order statistics of the runs of 0's and 1's in the estimation of the success probability associated with a sequence of independent Bernoulli trials, when this probability might be changing. Through theoretical arguments as well as Monte Carlo simulations, we show that appropriate linear combinations of these statistics offer the ability of following, relatively rapidly, the underlying probability when it is changing monotonically. In order to define our estimator, we introduce a coefficient that can be used in testing the null hypothesis that the underlying success probability has remained constant throughout the sequence of independent Bernoulli trials.
Distribution-Free Statistics. Power Functions and Their Properties. Asymptotic Relative Efficiency of Tests. Confidence Intervals and Bounds. Point Estimation. Linear Rank Statistics Under the Null Hypothesis. Two-Sample Location and Scale Problems. … Distribution-Free Statistics. Power Functions and Their Properties. Asymptotic Relative Efficiency of Tests. Confidence Intervals and Bounds. Point Estimation. Linear Rank Statistics Under the Null Hypothesis. Two-Sample Location and Scale Problems. The One-Sample Location Problem. Additional Methods for Constructing Distribution-Free Procedures. Other Important Problems. Appendix. Index.
Multivariate generalizations of the Wald-Wolfowitz runs statistic and the Smirnov maximum deviation statistic for the two-sample problem are presented. They are based on the minimal spanning tree of the pooled … Multivariate generalizations of the Wald-Wolfowitz runs statistic and the Smirnov maximum deviation statistic for the two-sample problem are presented. They are based on the minimal spanning tree of the pooled sample points. Some null distribution results are derived and a simulation study of power is reported.
Part 1 Preliminaries: construction of symmetric multivariate distributions notation of algebraic entities and characteristics of random quantities the d operator groups and invariance dirichlet distribution problems 1. Part 2 Spherically … Part 1 Preliminaries: construction of symmetric multivariate distributions notation of algebraic entities and characteristics of random quantities the d operator groups and invariance dirichlet distribution problems 1. Part 2 Spherically and elliptically symmetric distributions: introduction and definition marginal distributions, moments and density marginal distributions moments density the relationship between (phi) and f conditional distributions properties of elliptically symmetric distributions mixtures of normal distributions robust statistics and regression model robust statistics regression model log-elliptical and additive logistic elliptical distributions multivariate log-elliptical distribution additive logistic elliptical distributions complex elliptically symmetric distributions. Part 3 Some subclasses of elliptical distributions: multiuniform distribution the characteristic function moments marginal distribution conditional distributions uniform distribution in the unit sphere discussion symmetric Kotz type distributions definition distribution of R(2) moments multivariate normal distributions the c.f. of Kotz type distributions symmetric multivariate Pearson type VII distributions definition marginal densities conditional distributions moments conditional distributions moments some examples extended Tn family relationships between Ln and Tn families of distributions order statistics mixtures of exponential distributions independence, robustness and characterizations problems V. Part 6 Multivariate Liouville distributions: definitions and properties examples marginal distributions conditional distribution characterizations scale-invariant statistics survival functions inequalities and applications.
Abstract The empirical measure P n for independent sampling on a distribution P is formed by placing mass n −1 at each of the first n sample points. In this … Abstract The empirical measure P n for independent sampling on a distribution P is formed by placing mass n −1 at each of the first n sample points. In this paper, n ½ ( P n − P ) is regarded as a stochastic process indexed by a family of square integrable functions. A functional central limit theorem is proved for this process. The statement of this theorem involves a new form of combinatorial entropy, definable for classes of square integrable functions.
Let $(X, \mathscr{A})$ be a measurable space and $\mathscr{F}$ a class of measurable functions on $X. \mathscr{F}$ is called a universal Donsker class if for every probability measure $P$ on … Let $(X, \mathscr{A})$ be a measurable space and $\mathscr{F}$ a class of measurable functions on $X. \mathscr{F}$ is called a universal Donsker class if for every probability measure $P$ on $\mathscr{A}$, the centered and normalized empirical measures $n^{1/2}(P_n - P)$ converge in law, with respect to uniform convergence over $\mathscr{F}$, to the limiting "Brownian bridge" process $G_P$. Then up to additive constants, $\mathscr{F}$ must be uniformly bounded. Several nonequivalent conditions are shown to imply the universal Donsker property. Some are connected with the Vapnik-Cervonenkis combinatorial condition on classes of sets, others with metric entropy. The implications between the various conditions are considered. Bounds are given for the metric entropy of convex hulls in Hilbert space.
Abstract A new class of simple tests is proposed for the general multivariate two-sample problem based on the (possibly weighted) proportion of all k nearest neighbor comparisons in which observations … Abstract A new class of simple tests is proposed for the general multivariate two-sample problem based on the (possibly weighted) proportion of all k nearest neighbor comparisons in which observations and their neighbors belong to the same sample. Large values of the test statistics give evidence against the hypothesis H of equality of the two underlying distributions. Asymptotic null distributions are explicitly determined and shown to involve certain nearest neighbor interaction probabilities. Simple infinite-dimensional approximations are supplied. The unweighted version yields a distribution-free test that is consistent against all alternatives; optimally weighted statistics are also obtained and asymptotic efficiencies are calculated. Each of the tests considered is easily adapted to a permutation procedure that conditions on the pooled sample. Power performance for finite sample sizes is assessed in simulations. Key Words: Distribution-free Kth nearest neighborInfinite-dimensional approximation
One of the ways in which functional data analysis differs from other areas of statistics is in the extent to which data are pre-processed prior to analysis. Functional data are … One of the ways in which functional data analysis differs from other areas of statistics is in the extent to which data are pre-processed prior to analysis. Functional data are invariably recorded discretely, although they are generally substantially smoothed as a prelude even to viewing by statisticians, let alone further analysis. This has a potential to interfere with the performance of two-sample statistical tests, since the use of different tuning parameters for the smoothing step, or different observation times or subsample sizes (i.e., numbers of observations per curve), can mask the differences between distributions that a test is trying to locate. In this paper, and in the context of two-sample tests, we take up this issue. Ways of pre-processing the data, so as to minimise the effects of smoothing, are suggested. We show theoretically and numerically that, by employing exactly the same tuning parameter (e.g. bandwidth) to produce each curve from its raw data, significant loss of power can be avoided. Provided a common tuning parameter is used, it is often satisfactory to choose that parameter along conventional lines, as though the target was estimation of the continuous functions themselves, rather than testing hypotheses about them. Moreover, in this case, using a second-order smoother (such as a local-linear method), the subsample sizes can be almost as small as the square root of sample sizes before the effects of smoothing have any first-order impact on the results of a two-sample test.
Necessary and sufficient conditions for the law of large numbers and sufficient conditions for the central limit theorem for $U$-processes are given. These conditions are in terms of random metric … Necessary and sufficient conditions for the law of large numbers and sufficient conditions for the central limit theorem for $U$-processes are given. These conditions are in terms of random metric entropies. The CLT and LLN for VC subgraph classes of functions as well as for classes satisfying bracketing conditions follow as consequences of the general results. In particular, Liu's simplicial depth process satisfies both the LLN and the CLT. Among the techniques used, randomization, decoupling inequalities, integrability of Gaussian and Rademacher chaos and exponential inequalities for $U$-statistics should be mentioned.
Journal Article Plotting squared radii N. J. H. SMALL N. J. H. SMALL Price Commission HeadquartersLondon Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume … Journal Article Plotting squared radii N. J. H. SMALL N. J. H. SMALL Price Commission HeadquartersLondon Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 65, Issue 3, December 1978, Pages 657–658, https://doi.org/10.1093/biomet/65.3.657 Published: 01 December 1978 Article history Received: 01 February 1978 Revision received: 01 March 1978 Published: 01 December 1978
The intrinsic dimensionality of a set of patterns is important in determining an appropriate number of features for representing the data and whether a reasonable two- or three-dimensional representation of … The intrinsic dimensionality of a set of patterns is important in determining an appropriate number of features for representing the data and whether a reasonable two- or three-dimensional representation of the data exists. We propose an intuitively appealing, noniterative estimator for intrinsic dimensionality which is based on nearneighbor information. We give plausible arguments supporting the consistency of this estimator. The method works well in identifying the true dimensionality for a variety of artificial data sets and is fairly insensitive to the number of samples and to the algorithmic parameters. Comparisons between this new method and the global eigenvalue approach demonstrate the utility of our estimator.
Functional principal component analysis (FPCA) based on the Karhunen--Loève decomposition has been successfully applied in many applications, mainly for one sample problems. In this paper we consider common functional principal … Functional principal component analysis (FPCA) based on the Karhunen--Loève decomposition has been successfully applied in many applications, mainly for one sample problems. In this paper we consider common functional principal components for two sample problems. Our research is motivated not only by the theoretical challenge of this data situation, but also by the actual question of dynamics of implied volatility (IV) functions. For different maturities the log-returns of IVs are samples of (smooth) random functions and the methods proposed here study the similarities of their stochastic behavior. First we present a new method for estimation of functional principal components from discrete noisy data. Next we present the two sample inference for FPCA and develop the two sample theory. We propose bootstrap tests for testing the equality of eigenvalues, eigenfunctions, and mean functions of two functional samples, illustrate the test-properties by simulation study and apply the method to the IV analysis.
New technologies allow us to handle increasingly large datasets, while monitoring devices are becoming ever more sophisticated. This high-tech progress produces statistical units sampled over finer an New technologies allow us to handle increasingly large datasets, while monitoring devices are becoming ever more sophisticated. This high-tech progress produces statistical units sampled over finer an
Let $Y_{i}$, $i\geq1$, be i.i.d. random variables having values in an $m$-dimensional manifold $\mathcal{M}\subset\mathbb{R}^{d}$ and consider sums $\sum_{i=1}^{n}\xi(n^{1/m}Y_{i},\{n^{1/m}Y_{j}\}_{j=1}^{n})$, where $\xi$ is a real valued function defined on pairs $(y,\mathcal{Y} )$, … Let $Y_{i}$, $i\geq1$, be i.i.d. random variables having values in an $m$-dimensional manifold $\mathcal{M}\subset\mathbb{R}^{d}$ and consider sums $\sum_{i=1}^{n}\xi(n^{1/m}Y_{i},\{n^{1/m}Y_{j}\}_{j=1}^{n})$, where $\xi$ is a real valued function defined on pairs $(y,\mathcal{Y} )$, with $y\in\mathbb{R}^{d}$ and $\mathcal{Y}\subset\mathbb{R}^{d}$ locally finite. Subject to $\xi$ satisfying a weak spatial dependence and continuity condition, we show that such sums satisfy weak laws of large numbers, variance asymptotics and central limit theorems. We show that the limit behavior is controlled by the value of $\xi$ on homogeneous Poisson point processes on $m$-dimensional hyperplanes tangent to $\mathcal{M} $. We apply the general results to establish the limit theory of dimension and volume content estimators, Rényi and Shannon entropy estimators and clique counts in the Vietoris–Rips complex on $\{Y_{i}\}_{i=1}^{n}$.
The univariate weak convergence theorem of Murota and Takeuchi (1981) is extended for the Mahalanobis transform of the $d$-variate empirical characteristic function, $d \geq 1$. Then a maximal deviation statistic … The univariate weak convergence theorem of Murota and Takeuchi (1981) is extended for the Mahalanobis transform of the $d$-variate empirical characteristic function, $d \geq 1$. Then a maximal deviation statistic is proposed for testing the composite hypothesis of $d$-variate normality. Fernique's inequality is used in conjunction with a combination of analytic, numerical analytic, and computer techniques to derive exact upper bounds for the asymptotic percentage points of the statistic. The resulting conservative large sample test is shown to be consistent against every alternative with components having a finite variance. (If $d = 1$ it is consistent against every alternative.) Monte Carlo experiments and the performance of the test on some well-known data sets are also discussed.
Let $(B_n)$ be an increasing sequence of regions in $d$ -dimensional space with volume $n$ and with union $\mathbb{R}^d$. We prove a general central limit theorem for functionals of point … Let $(B_n)$ be an increasing sequence of regions in $d$ -dimensional space with volume $n$ and with union $\mathbb{R}^d$. We prove a general central limit theorem for functionals of point sets, obtained either by restricting a homogeneous Poisson process to $(B_n)$, or by by taking $n$ uniformly distributed points in $(B_n)$. The sets $(B_n)$ could be all cubes but a more general class of regions$(B_n)$ is considered. Using this general result we obtain central limit theorems for specific functionals suchas total edge lengthand number of components, defined in terms of graphs such as the $k$-nearest neighbors graph, the sphere of influence graph and the Voronoi graph.
Abstract Methods of adaptive smoothing of density estimates, where the amount of smoothing applied varies according to local features of the underlying density, are investigated. The difficulties of applying Taylor … Abstract Methods of adaptive smoothing of density estimates, where the amount of smoothing applied varies according to local features of the underlying density, are investigated. The difficulties of applying Taylor series arguments in this context are explored. Simple properties of the estimates are investigated by numerical integration and compared with the fixed kernel approach. Optimal smoothing strategies, based on the multivariate Normal distribution, are derived. As an application of these techniques, two tests of multivariate Normality—one based on integrated squared error and one on entropy—are developed, and some power calculations are carried out.
A data depth can be used to measure the “depth” or “outlyingness” of a given multivariate sample with respect to its underlying distribution. This leads to a natural center-outward ordering … A data depth can be used to measure the “depth” or “outlyingness” of a given multivariate sample with respect to its underlying distribution. This leads to a natural center-outward ordering of the sample points. Based on this ordering, quantitative and graphical methods are introduced for analyzing multivariate distributional characteristics such as location, scale, bias, skewness and kurtosis, as well as for comparing inference methods. All graphs are one-dimensional curves in the plane and can be easily visualized and interpreted. A “sunburst plot” is presented as a bivariate generalization of the box-plot. DD-(depth versus depth) plots are proposed and examined as graphical inference tools. Some new diagnostic tools for checking multivariate normality are introduced. One of them monitors the exact rate of growth of the maximum deviation from the mean, while the others examine the ratio of the overall dispersion to the dispersion of a certain central region. The affine invariance property of a data depth also leads to appropriate invariance properties for the proposed statistics and methods.
Let V k,n be the number of vertices of degree k in the Euclidean minimal spanning tree of X i , , where the X i are independent, absolutely continuous … Let V k,n be the number of vertices of degree k in the Euclidean minimal spanning tree of X i , , where the X i are independent, absolutely continuous random variables with values in R d . It is proved that n –1 V k,n converges with probability 1 to a constant α k,d . Intermediate results provide information about how the vertex degrees of a minimal spanning tree change as points are added or deleted, about the decomposition of minimal spanning trees into probabilistically similar trees, and about the mean and variance of V k,n .
Abstract We propose a method of comparing two functional linear models in which explanatory variables are functions (curves) and responses can be either scalars or functions. In such models, the … Abstract We propose a method of comparing two functional linear models in which explanatory variables are functions (curves) and responses can be either scalars or functions. In such models, the role of parameter vectors (or matrices) is played by integral operators acting on a function space. We test the null hypothesis that these operators are the same in two independent samples. The complexity of the test statistics increases as we move from scalar to functional responses and relax assumptions on the covariance structure of the regressors. They all, however, have an asymptotic chi‐squared distribution with the number of degrees of freedom which depends on a specific setting. The test statistics are readily computable using the R package fda , and have good finite sample properties. The test is applied to egg‐laying curves of Mediterranean flies and to data from terrestrial magnetic observatories. The Canadian Journal of Statistics © 2009 Statistical Society of Canada
A new technique called the generalized gap test for the detection of multivariate outliers is presented. It is based on the observation that the distribution of the lengths of the … A new technique called the generalized gap test for the detection of multivariate outliers is presented. It is based on the observation that the distribution of the lengths of the edges of minimum spanning trees (based on a matrix of distances between all pairs of points) is quite sensitive to the presernce of observations separated from the main mtultivariate cloud of points. If the data are multivariate normal then the distribution of squared edge lengths follows the gamma distribution quite closely. Thus departure from expectation can be detected using gamma quantile plots. A table of critical values is also given for testing whether the maximum squared edge length divided by the mean squared edge length is too large.
Journal Article Measures of multivariate skewness and kurtosis with applications Get access K. V. MARDIA K. V. MARDIA University of Hull Search for other works by this author on: Oxford … Journal Article Measures of multivariate skewness and kurtosis with applications Get access K. V. MARDIA K. V. MARDIA University of Hull Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 57, Issue 3, December 1970, Pages 519–530, https://doi.org/10.1093/biomet/57.3.519 Published: 01 December 1970 Article history Revision received: 01 March 1970 Received: 01 December 1970 Published: 01 December 1970
Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration … Abstract The use of graph‐theoretical structures, principally the Minimal Spanning Tree and the k Nearest Neighbors Graph, allows the multivariate generalization of certain nonparametric statistical procedures and suggests the consideration of new ones. The areas of statistical application of graph theoretic procedures include the two‐sample problem, clustering, outlier identification, hypothesis testing, and dimension identification.
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Rd from random samples. The method is based on the convergence rates of a certain … We present a new method to estimate the intrinsic dimensionality of a submanifold M in Rd from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets.
Han's maximum rank correlation (MRC) estimator is shown to be √ n-consistent and asymptotically normal.The proof rests on a general method for determining the asymptotic distribution of a maximization estimator, … Han's maximum rank correlation (MRC) estimator is shown to be √ n-consistent and asymptotically normal.The proof rests on a general method for determining the asymptotic distribution of a maximization estimator, a simple U-statistic decomposition, and a uniform bound for degenerate U-processes.A consistent estimator of the asymptotic covariance matrix is provided, along with a result giving the explicit form of this matrix for any model within the scope of the MRC estimator.The latter result is applied to the binary choice model, and it is found that the MRC estimator does not achieve the semiparametric efficiency bound.
Let n points be placed independently in ν-dimensional space according to the standard ν-dimensional normal distribution. Let M n be the longest edge-length of the minimal spanning tree on these … Let n points be placed independently in ν-dimensional space according to the standard ν-dimensional normal distribution. Let M n be the longest edge-length of the minimal spanning tree on these points; equivalently let M n be the infimum of those r such that the union of balls of radius r /2 centred at the points is connected. We show that the distribution of (2 log n ) 1/2 M n - b n converges weakly to the Gumbel (double exponential) distribution, where b n are explicit constants with b n ~ (ν - 1)log log n . We also show the same result holds if M n is the longest edge-length for the nearest neighbour graph on the points.
Several classes of functions are shown to be Donsker by an argument based on partitioning the sample space. One example is the class of all nondecreasing functions $f: \mathbb{R} \to … Several classes of functions are shown to be Donsker by an argument based on partitioning the sample space. One example is the class of all nondecreasing functions $f: \mathbb{R} \to \mathbb{R}$ such that $0 \leq f \leq F$ for a given function F with $\int F^2 dP/ \sqrt{1-P} < \infty$.
We investigate properties of a bootstrap-based methodology for testing hypotheses about equality of certain characteristics of the distributions between different populations in the context of functional data. The suggested testing … We investigate properties of a bootstrap-based methodology for testing hypotheses about equality of certain characteristics of the distributions between different populations in the context of functional data. The suggested testing methodology is simple and easy to implement. It resamples the original dataset in such a way that the null hypothesis of interest is satisfied and it can be potentially applied to a wide range of testing problems and test statistics of interest. Furthermore, it can be utilized to the case where more than two populations of functional data are considered. We illustrate the bootstrap procedure by considering the important problems of testing the equality of mean functions or the equality of covariance functions (resp. covariance operators) between two populations. Theoretical results that justify the validity of the suggested bootstrap-based procedure are established. Furthermore, simulation results demonstrate very good size and power performances in finite sample situations, including the case of testing problems and/or sample sizes where asymptotic considerations do not lead to satisfactory approximations. A real-life dataset analyzed in the literature is also examined.
1. Preface. This is an expository paper giving an account of the goodness of fit test and the two sample test based on the empirical distribution functiontests which were initiated … 1. Preface. This is an expository paper giving an account of the goodness of fit test and the two sample test based on the empirical distribution functiontests which were initiated by the four authors cited in the title. An attempt is made here to give a fairly complete coverage of the history, development, present status, and outstanding current problems related to these topics. The reader is advised that the relative amount of space and emphasis allotted to the various phases of the subject does not reflect necessarily their intrinsic merit and importance, but rather the author's personal interest and familiarity. Also, for the sake of uniformity the notationt of miany of the writers quoted has been altered so that when referring to the original papers it will be necessary to check their nomenlclature. 2. The empirical distribution function and the tests. Let XI, X2, * * ,XXn be independent random variables (observations) each having the same distribution
Ratios of the form $(x_n - x_{n-j})/(x_n - x_i)$ for small values of $i$ and $j$ and $n = 3, \cdots, 30$ are discussed. The variables concerned are order statistics, … Ratios of the form $(x_n - x_{n-j})/(x_n - x_i)$ for small values of $i$ and $j$ and $n = 3, \cdots, 30$ are discussed. The variables concerned are order statistics, i.e., sample values such that $x_1 < x_2 < \cdots < x_n$. Analytic results are obtained for the distributions of these ratios for several small values of $n$ and percentage values are tabled for these distributions for samples of size $n \leqq 30$.
Percentage-percentage (P-P) probability plots constructed from standardized observations possess some attractive features that are not shared by more commonly used quantile-quantile (Q-Q) plots. In particular, the identification of viable alternatives … Percentage-percentage (P-P) probability plots constructed from standardized observations possess some attractive features that are not shared by more commonly used quantile-quantile (Q-Q) plots. In particular, the identification of viable alternatives to a proposed probability model can be greatly facilitated by displaying curves on P-P plots to represent families of alternative models. A single curve can represent an entire family of alternatives indexed by both location and scale parameters. Two goodness-of-fit statistics, based on measures of linearity for standardized P-P plots, are proposed and simple approximations for percentage points of these statistics are presented for testing the fit of exponential, Gumbel (Weibull), and normal (lognormal) probability models with unknown parameters. Results of extensive Monte Carlo power comparisons with other goodness-of-fit tests are summarized. The proposed tests are shown to have superior power for detecting light-tailed and moderate-tailed alternatives to...
This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits … This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators--intermediaries between sample mean and sample median--that are asymptotically most robust (in a sense to be specified) among all translation invariant estimators. For the general background, see Tukey (1960) (p. 448 ff.) Let $x_1, \cdots, x_n$ be independent random variables with common distribution function $F(t - \xi)$. The problem is to estimate the location parameter $\xi$, but with the complication that the prototype distribution $F(t)$ is only approximately known. I shall primarily be concerned with the model of indeterminacy $F = (1 - \epsilon)\Phi + \epsilon H$, where $0 \leqq \epsilon < 1$ is a known number, $\Phi(t) = (2\pi)^{-\frac{1}{2}} \int^t_{-\infty} \exp(-\frac{1}{2}s^2) ds$ is the standard normal cumulative and $H$ is an unknown contaminating distribution. This model arises for instance if the observations are assumed to be normal with variance 1, but a fraction $\epsilon$ of them is affected by gross errors. Later on, I shall also consider other models of indeterminacy, e.g., $\sup_t |F(t) - \Phi(t)| \leqq \epsilon$. Some inconvenience is caused by the fact that location and scale parameters are not uniquely determined: in general, for fixed $\epsilon$, there will be several values of $\xi$ and $\sigma$ such that $\sup_t|F(t) - \Phi((t - \xi)/\sigma)| \leqq \epsilon$, and similarly for the contaminated case. Although this inherent and unavoidable indeterminacy is small if $\epsilon$ is small and is rather irrelevant for practical purposes, it poses awkward problems for the theory, especially for optimality questions. To remove this difficulty, one may either (i) restrict attention to symmetric distributions, and estimate the location of the center of symmetry (this works for $\xi$ but not for $\sigma$); or (ii) one may define the parameter to be estimated in terms of the estimator itself, namely by its asymptotic value for sample size $n \rightarrow \infty$; or (iii) one may define the parameters by arbitrarily chosen functionals of the distribution (e.g., by the expectation, or the median of $F$). All three possibilities have unsatisfactory aspects, and I shall usually choose the variant which is mathematically most convenient. It is interesting to look back to the very origin of the theory of estimation, namely to Gauss and his theory of least squares. Gauss was fully aware that his main reason for assuming an underlying normal distribution and a quadratic loss function was mathematical, i.e., computational, convenience. In later times, this was often forgotten, partly because of the central limit theorem. However, if one wants to be honest, the central limit theorem can at most explain why many distributions occurring in practice are approximately normal. The stress is on the word "approximately." This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): What happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance: seemingly quite mild deviations may already explode its variance. Tukey and others proposed several more robust substitutes--trimmed means, Winsorized means, etc.--and explored their performance for a few typical violations of normality. A general theory of robust estimation is still lacking; it is hoped that the present paper will furnish the first few steps toward such a theory. At the core of the method of least squares lies the idea to minimize the sum of the squared "errors," that is, to adjust the unknown parameters such that the sum of the squares of the differences between observed and computed values is minimized. In the simplest case, with which we are concerned here, namely the estimation of a location parameter, one has to minimize the expression $\sum_i (x_i - T)^2$; this is of course achieved by the sample mean $T = \sum_i x_i/n$. I should like to emphasize that no loss function is involved here; I am only describing how the least squares estimator is defined, and neither the underlying family of distributions nor the true value of the parameter to be estimated enters so far. It is quite natural to ask whether one can obtain more robustness by minimizing another function of the errors than the sum of their squares. We shall therefore concentrate our attention to estimators that can be defined by a minimum principle of the form (for a location parameter): $T = T_n(x_1, \cdots, x_n) minimizes \sum_i \rho(x_i - T),$ \begin{equation*} \tag{M} where \rho is a non-constant function. \end{equation*} Of course, this definition generalizes at once to more general least squares type problems, where several parameters have to be determined. This class of estimators contains in particular (i) the sample mean $(\rho(t) = t^2)$, (ii) the sample median $(\rho(t) = |t|)$, and more generally, (iii) all maximum likelihood estimators $(\rho(t) = -\log f(t)$, where $f$ is the assumed density of the untranslated distribution). These ($M$)-estimators, as I shall call them for short, have rather pleasant asymptotic properties; sufficient conditions for asymptotic normality and an explicit expression for their asymptotic variance will be given. How should one judge the robustness of an estimator $T_n(x) = T_n(x_1, \cdots, x_n)$? Since ill effects from contamination are mainly felt for large sample sizes, it seems that one should primarily optimize large sample robustness properties. Therefore, a convenient measure of robustness for asymptotically normal estimators seems to be the supremum of the asymptotic variance $(n \rightarrow \infty)$ when $F$ ranges over some suitable set of underlying distributions, in particular over the set of all $F = (1 - \epsilon)\Phi + \epsilon H$ for fixed $\epsilon$ and symmetric $H$. On second thought, it turns out that the asymptotic variance is not only easier to handle, but that even for moderate values of $n$ it is a better measure of performance than the actual variance, because (i) the actual variance of an estimator depends very much on the behavior of the tails of $H$, and the supremum of the actual variance is infinite for any estimator whose value is always contained in the convex hull of the observations. (ii) If an estimator is asymptotically normal, then the important central part of its distribution and confidence intervals for moderate confidence levels can better be approximated in terms of the asymptotic variance than in terms of the actual variance. If we adopt this measure of robustness, and if we restrict attention to ($M$)-estimators, then it will be shown that the most robust estimator is uniquely determined and corresponds to the following $\rho:\rho(t) = \frac{1}{2}t^2$ for $|t| < k, \rho(t) = k|t| - \frac{1}{2}k^2$ for $|t| \geqq k$, with $k$ depending on $\epsilon$. This estimator is most robust even among all translation invariant estimators. Sample mean $(k = \infty)$ and sample median $(k = 0)$ are limiting cases corresponding to $\epsilon = 0$ and $\epsilon = 1$, respectively, and the estimator is closely related and asymptotically equivalent to Winsorizing. I recall the definition of Winsorizing: assume that the observations have been ordered, $x_1 \leqq x_2 \leqq \cdots \leqq x_n$, then the statistic $T = n^{-1}(gx_{g + 1} + x_{g + 1} + x_{g + 2} + \cdots + x_{n - h} + hx_{n - h})$ is called the Winsorized mean, obtained by Winsorizing the $g$ leftmost and the $h$ rightmost observations. The above most robust ($M$)-estimators can be described by the same formula, except that in the first and in the last summand, the factors $x_{g + 1}$ and $x_{n - h}$ have to be replaced by some numbers $u, v$ satisfying $x_g \leqq u \leqq x_{g + 1}$ and $x_{n - h} \leqq v \leqq x_{n - h + 1}$, respectively; $g, h, u$ and $v$ depend on the sample. In fact, this ($M$)-estimator is the maximum likelihood estimator corresponding to a unique least favorable distribution $F_0$ with density $f_0(t) = (1 - \epsilon)(2\pi)^{-\frac{1}{2}}e^{-\rho(t)}$. This $f_0$ behaves like a normal density for small $t$, like an exponential density for large $t$. At least for me, this was rather surprising--I would have expected an $f_0$ with much heavier tails. This result is a particular case of a more general one that can be stated roughly as follows: Assume that $F$ belongs to some convex set $C$ of distribution functions. Then the most robust ($M$)-estimator for the set $C$ coincides with the maximum likelihood estimator for the unique $F_0 \varepsilon C$ which has the smallest Fisher information number $I(F) = \int (f'/f)^2f dt$ among all $F \varepsilon C$. Miscellaneous related problems will also be treated: the case of non-symmetric contaminating distributions; the most robust estimator for the model of indeterminacy $\sup_t|F(t) - \Phi(t)| \leqq \epsilon$; robust estimation of a scale parameter; how to estimate location, if scale and $\epsilon$ are unknown; numerical computation of the estimators; more general estimators, e.g., minimizing $\sum_{i < j} \rho(x_i - T, x_j - T)$, where $\rho$ is a function of two arguments. Questions of small sample size theory will not be touched in this paper.
We present B, V, R, and H alpha photometry of 8 clusters in the Small Magellanic Cloud, 5 in the Large Magellanic Cloud, and 3 Galactic clusters, and use 2 … We present B, V, R, and H alpha photometry of 8 clusters in the Small Magellanic Cloud, 5 in the Large Magellanic Cloud, and 3 Galactic clusters, and use 2 color diagrams (2-CDs) to identify candidate Be star populations in these clusters. We find evidence that the Be phenomenon is enhanced in low metallicity environments, based on the observed fractional early-type candidate Be star content of clusters of age 10-25 Myr. Numerous candidate Be stars of spectral types B0 to B5 were identified in clusters of age 5-8 Myr, challenging the suggestion of Fabregat & Torrejon (2000) that classical Be stars should only be found in clusters at least 10 Myr old. These results suggest that a significant number of B-type stars must emerge onto the zero-age-main-sequence as rapid rotators. We also detect an enhancement in the fractional content of early-type candidate Be stars in clusters of age 10-25 Myr, suggesting that the Be phenomenon does become more prevalent with evolutionary age. We briefly discuss the mechanisms which might contribute to such an evolutionary effect. A discussion of the limitations of utilizing the 2-CD technique to investigate the role evolutionary age and/or metallicity play in the development of the Be phenomenon is offered, and we provide evidence that other B-type objects of very different nature, such as candidate Herbig Ae/Be stars may contaminate the claimed detections of ``Be stars'' via 2-CDs.