Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures

Type: Preprint

Publication Date: 2024-11-19

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2411.12438

Abstract

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04] (in addition to several other applications). As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ time and samples for clustering such mixtures. Our results may come as a surprise in the context of the $d^{\Omega(k)}$ statistical query lower bound [DKS17] for clustering non-spherical Gaussian mixtures. While this result is usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Outlier-Robust Clustering of Non-Spherical Mixtures 2020 Ainesh Bakshi
Pravesh K. Kothari
+ Outlier-Robust Clustering of Gaussians and Other Non-Spherical Mixtures 2020 Ainesh Bakshi
Ilias Diakonikolas
Samuel B. Hopkins
Daniel M. Kane
Sushrut Karmalkar
Pravesh K. Kothari
+ Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures 2014 Martin Azizyan
Aarti Singh
Larry Wasserman
+ Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures 2014 Martin Azizyan
Aarti Singh
Larry Wasserman
+ Mixture Models, Robustness, and Sum of Squares Proofs 2017 Samuel B. Hopkins
Jerry Li
+ Learning Mixtures of Gaussians using the k-means Algorithm 2009 Kamalika Chaudhuri
Sanjoy Dasgupta
Andrea Vattani
+ Recovery of a mixture of Gaussians by sum-of-norms clustering 2019 Tao Jiang
Stephen A. Vavasis
Chen Wen Zhai
+ Better Agnostic Clustering Via Relaxed Tensor Norms 2017 Pravesh K. Kothari
Jacob Steinhardt
+ Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures 2021 Rares-Darius Buhai
David Steurer
+ Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds 2017 Dan Kushnir
Shirin Jalali
Iraj Saniee
+ PDF Chat The Informativeness of $k$ -Means for Learning Mixture Models 2019 Zhaoqiang Liu
Vincent Y. F. Tan
+ Near-optimal-sample estimators for spherical Gaussian mixtures 2014 Jayadev Acharya
Ashkan Jafarpour
Alon Orlitsky
Ananda Theertha Suresh
+ Near-optimal-sample estimators for spherical Gaussian mixtures 2014 Jayadev Acharya
Ashkan Jafarpour
Alon Orlitsky
Ananda Theertha Suresh
+ Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation 2023 Ilias Diakonikolas
Daniel M. Kane
Jasper C. H. Lee
Thanasis Pittas
+ PDF Chat Sum-of-squares lower bounds for Non-Gaussian Component Analysis 2024 Ilias Diakonikolas
Sushrut Karmalkar
Shuo Pang
Aaron Potechin
+ List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians 2017 Ilias Diakonikolas
Daniel M. Kane
Alistair Stewart
+ List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians 2017 Ilias Diakonikolas
Daniel M. Kane
Alistair Stewart
+ Clustering with Spectral Norm and the k-means Algorithm 2010 Amit Kumar
Ravindran Kannan
+ Spherical clustering in detection of groups of concomitant extremes 2020 V. Fomichov
J. Ivanovs
+ Robustly Clustering a Mixture of Gaussians 2019 Jia He
Santosh Vempala

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors